date:20200622

Re: [PATCH] isofs: fix High Sierra dirent flag accesses

2020-06-22 Thread Matthew Wilcox

On Sun, Jun 21, 2020 at 07:08:17AM +0300, Egor Chelak wrote:
> The flags byte of the dirent was accessed as de->flags[0] in a couple of
> places, and not as de->flags[-sbi->s_high_sierra], which is how it's
> accessed elsewhere. This caused a bug, where some files on an HSF disc
> could be inaccessible.
> 
> For context, here is the difference between HSF dirents and ISO dirents:
> Offset  | High Sierra | ISO-9660   | struct iso_directory_record
> Byte 24 | Flags   | mtime timezone | de->date[6] (de->flags[-1])
> Byte 25 | Reserved| Flags  | de->flags[0]

Also, ew.  Why on earth do we do 'de->flags[-sbi->s_high_sierra]'?
I'm surprised we don't have any tools that warn about references outside
an array.  I would do this as ...

static inline u8 de_flags(struct isofs_sb_info *sbi,
struct iso_directory_record *de)
{
if (sbi->s_high_sierra)
return de->date[6];
return de->flags;
}

[ANNOUNCE] 4.19.127-rt55

2020-06-22 Thread Tom Zanussi

Hello RT Folks!

I'm pleased to announce the 4.19.127-rt55 stable release.

You can get this release via the git tree at:

  git://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git

  branch: v4.19-rt
  Head SHA1: f297d3d16170bd3af56f7310963c727ce2cab5c7

Or to build 4.19.127-rt55 directly, the following patches should be applied:

  https://www.kernel.org/pub/linux/kernel/v4.x/linux-4.19.tar.xz

  https://www.kernel.org/pub/linux/kernel/v4.x/patch-4.19.127.xz

  
https://www.kernel.org/pub/linux/kernel/projects/rt/4.19/patch-4.19.127-rt55.patch.xz


You can also build from 4.19.127-rt54 by applying the incremental patch:

  
https://www.kernel.org/pub/linux/kernel/projects/rt/4.19/incr/patch-4.19.127-rt54-rt55.patch.xz

Enjoy!

   Tom

Changes from v4.19.127-rt54:
---

Kevin Hao (1):
  mm: slub: Always flush the delayed empty slubs in flush_all()

Sebastian Andrzej Siewior (1):
  fs/dcache: Include swait.h header

Tom Zanussi (2):
  tasklet: Fix UP case for tasklet CHAINED state
  Linux 4.19.127-rt55
---
fs/proc/base.c   | 1 +
 kernel/softirq.c | 6 ++
 localversion-rt  | 2 +-
 mm/slub.c| 3 ---
 4 files changed, 8 insertions(+), 4 deletions(-)
---
diff --git a/fs/proc/base.c b/fs/proc/base.c
index a45d4d640f01..56b1c4f1e8c0 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -95,6 +95,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "internal.h"
 #include "fd.h"
 
diff --git a/kernel/softirq.c b/kernel/softirq.c
index 73dae64bfc9c..9bad7a16dc61 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -947,10 +947,12 @@ static void __tasklet_schedule_common(struct 
tasklet_struct *t,
 * is locked before adding it to the list.
 */
if (test_bit(TASKLET_STATE_SCHED, >state)) {
+#if defined(CONFIG_SMP) || defined(CONFIG_PREEMPT_RT_FULL)
if (test_and_set_bit(TASKLET_STATE_CHAINED, >state)) {
tasklet_unlock(t);
return;
}
+#endif
t->next = NULL;
*head->tail = t;
head->tail = &(t->next);
@@ -1044,7 +1046,11 @@ static void tasklet_action_common(struct softirq_action 
*a,
 again:
t->func(t->data);
 
+#if defined(CONFIG_SMP) || defined(CONFIG_PREEMPT_RT_FULL)
while (cmpxchg(>state, TASKLET_STATEF_RC, 0) != 
TASKLET_STATEF_RC) {
+#else
+   while (!tasklet_tryunlock(t)) {
+#endif
/*
 * If it got disabled meanwhile, bail out:
 */
diff --git a/localversion-rt b/localversion-rt
index 3165a8781ff5..51b05e9abe6f 100644
--- a/localversion-rt
+++ b/localversion-rt
@@ -1 +1 @@
--rt54
+-rt55
diff --git a/mm/slub.c b/mm/slub.c
index d243c6ef7fc9..a9473bbb1338 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2341,9 +2341,6 @@ static void flush_all(struct kmem_cache *s)
for_each_online_cpu(cpu) {
struct slub_free_list *f;
 
-   if (!has_cpu_slab(cpu, s))
-   continue;
-
f = _cpu(slub_free_list, cpu);
raw_spin_lock_irq(>lock);
list_splice_init(>list, );

Re: [PATCH v4 3/5] stack: Optionally randomize kernel stack offset each syscall

2020-06-22 Thread Kees Cook

On Mon, Jun 22, 2020 at 10:07:37PM +0200, Jann Horn wrote:
> On Mon, Jun 22, 2020 at 9:31 PM Kees Cook  wrote:
> > This provides the ability for architectures to enable kernel stack base
> > address offset randomization. This feature is controlled by the boot
> > param "randomize_kstack_offset=on/off", with its default value set by
> > CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT.
> [...]
> > +#define add_random_kstack_offset() do {
> > \
> > +   if (static_branch_maybe(CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT, \
> > +   _kstack_offset)) {\
> > +   u32 offset = this_cpu_read(kstack_offset);  \
> > +   u8 *ptr = __builtin_alloca(offset & 0x3FF); \
> > +   asm volatile("" : "=m"(*ptr));  \
> > +   }   \
> > +} while (0)
> 
> clang generates better code here if the mask is stack-aligned -
> otherwise it needs to round the stack pointer / the offset:

Interesting. I was hoping to avoid needing to know the architecture
stack alignment (leaving it up to the compiler).

> 
> $ cat alloca_align.c
> #include 
> void callee(void);
> 
> void alloca_blah(unsigned long rand) {
>   asm volatile(""::"r"(alloca(rand & MASK)));
>   callee();
> }
> $ clang -O3 -c -o alloca_align.o alloca_align.c -DMASK=0x3ff
> $ objdump -d alloca_align.o
> [...]
>0: 55push   %rbp
>1: 48 89 e5  mov%rsp,%rbp
>4: 81 e7 ff 03 00 00and$0x3ff,%edi
>a: 83 c7 0f  add$0xf,%edi
>d: 83 e7 f0  and$0xfff0,%edi
>   10: 48 89 e0  mov%rsp,%rax
>   13: 48 29 f8  sub%rdi,%rax
>   16: 48 89 c4  mov%rax,%rsp
>   19: e8 00 00 00 00callq  1e 
>   1e: 48 89 ec  mov%rbp,%rsp
>   21: 5dpop%rbp
>   22: c3retq
> $ clang -O3 -c -o alloca_align.o alloca_align.c -DMASK=0x3f0
> $ objdump -d alloca_align.o
> [...]
>0: 55push   %rbp
>1: 48 89 e5  mov%rsp,%rbp
>4: 48 89 e0  mov%rsp,%rax
>7: 81 e7 f0 03 00 00and$0x3f0,%edi
>d: 48 29 f8  sub%rdi,%rax
>   10: 48 89 c4  mov%rax,%rsp
>   13: e8 00 00 00 00callq  18 
>   18: 48 89 ec  mov%rbp,%rsp
>   1b: 5dpop%rbp
>   1c: c3retq
> $
> 
> (From a glance at the assembly, gcc seems to always assume that the
> length may be misaligned.)

Right -- this is why I didn't bother with it, since it didn't seem to
notice what I'd already done to the alloca() argument. (But from what I
could measure on cycle counts, the additional ALU didn't seem to really
make much difference ... it _would_ be nice to avoid it, of course.)

> Maybe this should be something along the lines of
> __builtin_alloca(offset & (0x3ff & ARCH_STACK_ALIGN_MASK)) (with
> appropriate definitions of the stack alignment mask depending on the
> architecture's choice of stack alignment for kernel code).

Is that explicitly selected anywhere in the kernel? I thought the
alignment was left up to the compiler (as in I've seen bugs fixed where
the kernel had to deal with the alignment choices the compiler was
making...)

-- 
Kees Cook

Re: [PATCH v6 17/19] mm: memcg/slab: use a single set of kmem_caches for all allocations

2020-06-22 Thread Shakeel Butt

On Mon, Jun 22, 2020 at 2:15 PM Roman Gushchin  wrote:
>
> On Mon, Jun 22, 2020 at 02:04:29PM -0700, Shakeel Butt wrote:
> > On Mon, Jun 22, 2020 at 1:37 PM Roman Gushchin  wrote:
> > >
> > > On Mon, Jun 22, 2020 at 12:21:28PM -0700, Shakeel Butt wrote:
> > > > On Mon, Jun 8, 2020 at 4:07 PM Roman Gushchin  wrote:
> > > > >
> > > > > Instead of having two sets of kmem_caches: one for system-wide and
> > > > > non-accounted allocations and the second one shared by all accounted
> > > > > allocations, we can use just one.
> > > > >
> > > > > The idea is simple: space for obj_cgroup metadata can be allocated
> > > > > on demand and filled only for accounted allocations.
> > > > >
> > > > > It allows to remove a bunch of code which is required to handle
> > > > > kmem_cache clones for accounted allocations. There is no more need
> > > > > to create them, accumulate statistics, propagate attributes, etc.
> > > > > It's a quite significant simplification.
> > > > >
> > > > > Also, because the total number of slab_caches is reduced almost twice
> > > > > (not all kmem_caches have a memcg clone), some additional memory
> > > > > savings are expected. On my devvm it additionally saves about 3.5%
> > > > > of slab memory.
> > > > >
> > > > > Suggested-by: Johannes Weiner 
> > > > > Signed-off-by: Roman Gushchin 
> > > > > Reviewed-by: Vlastimil Babka 
> > > > > ---
> > > > [snip]
> > > > >  static inline void memcg_slab_post_alloc_hook(struct kmem_cache *s,
> > > > >   struct obj_cgroup 
> > > > > *objcg,
> > > > > - size_t size, void **p)
> > > > > + gfp_t flags, size_t 
> > > > > size,
> > > > > + void **p)
> > > > >  {
> > > > > struct page *page;
> > > > > unsigned long off;
> > > > > size_t i;
> > > > >
> > > > > +   if (!objcg)
> > > > > +   return;
> > > > > +
> > > > > +   flags &= ~__GFP_ACCOUNT;
> > > > > for (i = 0; i < size; i++) {
> > > > > if (likely(p[i])) {
> > > > > page = virt_to_head_page(p[i]);
> > > > > +
> > > > > +   if (!page_has_obj_cgroups(page) &&
> > > >
> > > > The page is already linked into the kmem_cache, don't you need
> > > > synchronization for memcg_alloc_page_obj_cgroups().
> > >
> > > Hm, yes, in theory we need it. I guess the reason behind why I've never 
> > > seen any issues
> > > here is the SLUB percpu partial list.
> > >
> > > So in theory we need something like:
> > >
> > > diff --git a/mm/slab.h b/mm/slab.h
> > > index 0a31600a0f5c..44bf57815816 100644
> > > --- a/mm/slab.h
> > > +++ b/mm/slab.h
> > > @@ -237,7 +237,10 @@ static inline int 
> > > memcg_alloc_page_obj_cgroups(struct page *page,
> > > if (!vec)
> > > return -ENOMEM;
> > >
> > > -   page->obj_cgroups = (struct obj_cgroup **) ((unsigned long)vec | 
> > > 0x1UL);
> > > +   if (cmpxchg(>obj_cgroups, 0,
> > > +   (struct obj_cgroup **) ((unsigned long)vec | 0x1UL)))
> > > +   kfree(vec);
> > > +
> > > return 0;
> > >  }
> > >
> > >
> > > But I wonder if we might put it under #ifdef CONFIG_SLAB?
> > > Or any other ideas how to make it less expensive?
> > >
> > > > What's the reason to remove this from charge_slab_page()?
> > >
> > > Because at charge_slab_page() we don't know if we'll ever need
> > > page->obj_cgroups. Some caches might have only few or even zero
> > > accounted objects.
> > >
> >
> > If slab_pre_alloc_hook() returns a non-NULL objcg then we definitely
> > need page->obj_cgroups.  The charge_slab_page() happens between
> > slab_pre_alloc_hook() & slab_post_alloc_hook(), so, we should be able
> > to tell if page->obj_cgroups is needed.
>
> Yes, but the opposite is not always true: we can reuse the existing page
> without allocated page->obj_cgroups. In this case charge_slab_page() is
> not involved at all.
>

Hmm yeah, you are right. I missed that.

>
> Or do you mean that we can minimize the amount of required synchronization
> by allocating some obj_cgroups vectors from charge_slab_page()?

One optimization would be to always pre-allocate page->obj_cgroups for
kmem_caches with SLAB_ACCOUNT.

Re: [PATCH v2 0/6] kernfs: proposed locking and concurrency improvement

2020-06-22 Thread Rick Lindsley




On Mon, Jun 22, 2020 at 01:48:45PM -0400, Tejun Heo wrote:


It should be obvious that representing each consecutive memory range with a
separate directory entry is far from an optimal way of representing
something like this. It's outright silly.


On 6/22/20 11:03 AM, Greg Kroah-Hartman wrote:


I agree.  And again, Ian, you are just "kicking the problem down the
road" if we accept these patches.  Please fix this up properly so that
this interface is correctly fixed to not do looney things like this.


Given that we cannot change the underlying machine representation of this hardware, what 
do you (all, not just you Greg) consider to be "properly"?

Rick

Re: [PATCH v4 3/5] stack: Optionally randomize kernel stack offset each syscall

2020-06-22 Thread Kees Cook

On Mon, Jun 22, 2020 at 12:40:49PM -0700, Randy Dunlap wrote:
> On 6/22/20 12:31 PM, Kees Cook wrote:
> > This provides the ability for architectures to enable kernel stack base
> > address offset randomization. This feature is controlled by the boot
> > param "randomize_kstack_offset=on/off", with its default value set by
> > CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT.
> > 
> > Co-developed-by: Elena Reshetova 
> > Signed-off-by: Elena Reshetova 
> > Link: 
> > https://lore.kernel.org/r/20190415060918.3766-1-elena.reshet...@intel.com
> > Signed-off-by: Kees Cook 
> > ---
> >  Makefile |  4 
> >  arch/Kconfig | 23 ++
> >  include/linux/randomize_kstack.h | 40 
> >  init/main.c  | 23 ++
> >  4 files changed, 90 insertions(+)
> >  create mode 100644 include/linux/randomize_kstack.h
> 
> Please add documentation for the new kernel boot parameter to
> Documentation/admin-guide/kernel-parameters.txt.

Oops, yes. Thanks for the reminder!

(I wonder if checkpatch can notice "+early_param" and suggest the Doc
update hmmm)

-- 
Kees Cook

Re: [PATCH v2] clk: at91: add sama5d3 pmc driver

2020-06-22 Thread Ahmad Fatoum

Hello Alexandre,

On 1/10/20 11:30 PM, Alexandre Belloni wrote:
> Add a driver for the PMC clocks of the sama5d3.
> 
> Signed-off-by: Alexandre Belloni 
> ---
> Changes in v2:
>  - fixed the output range for the paripheral clocks
>  - added a comment why the PMC driver can't be a platform driver
> 
>  drivers/clk/at91/Makefile  |   1 +
>  drivers/clk/at91/sama5d3.c | 240 +
>  2 files changed, 241 insertions(+)
>  create mode 100644 drivers/clk/at91/sama5d3.c
> 
> diff --git a/drivers/clk/at91/Makefile b/drivers/clk/at91/Makefile
> index 3732241352ce..e3be7f40f79e 100644
> --- a/drivers/clk/at91/Makefile
> +++ b/drivers/clk/at91/Makefile
> @@ -17,5 +17,6 @@ obj-$(CONFIG_HAVE_AT91_I2S_MUX_CLK) += clk-i2s-mux.o
>  obj-$(CONFIG_HAVE_AT91_SAM9X60_PLL)  += clk-sam9x60-pll.o
>  obj-$(CONFIG_SOC_AT91SAM9) += at91sam9260.o at91sam9rl.o at91sam9x5.o
>  obj-$(CONFIG_SOC_SAM9X60) += sam9x60.o
> +obj-$(CONFIG_SOC_SAMA5D3) += sama5d3.o
>  obj-$(CONFIG_SOC_SAMA5D4) += sama5d4.o
>  obj-$(CONFIG_SOC_SAMA5D2) += sama5d2.o
> diff --git a/drivers/clk/at91/sama5d3.c b/drivers/clk/at91/sama5d3.c
> new file mode 100644
> index ..88506f909c08
> --- /dev/null
> +++ b/drivers/clk/at91/sama5d3.c
> @@ -0,0 +1,240 @@
> +// SPDX-License-Identifier: GPL-2.0
> +#include 
> +#include 
> +#include 
> +
> +#include 
> +
> +#include "pmc.h"
> +
> +static const struct clk_master_characteristics mck_characteristics = {
> + .output = { .min = 0, .max = 16600 },
> + .divisors = { 1, 2, 4, 3 },
> +};
> +
> +static u8 plla_out[] = { 0 };
> +
> +static u16 plla_icpll[] = { 0 };
> +
> +static const struct clk_range plla_outputs[] = {
> + { .min = 4, .max = 10 },
> +};
> +
> +static const struct clk_pll_characteristics plla_characteristics = {
> + .input = { .min = 800, .max = 5000 },
> + .num_output = ARRAY_SIZE(plla_outputs),
> + .output = plla_outputs,
> + .icpll = plla_icpll,
> + .out = plla_out,
> +};
> +
> +static const struct clk_pcr_layout sama5d3_pcr_layout = {
> + .offset = 0x10c,
> + .cmd = BIT(12),
> + .pid_mask = GENMASK(6, 0),
> + .div_mask = GENMASK(17, 16),
> +};
> +
> +static const struct {
> + char *n;
> + char *p;
> + u8 id;
> +} sama5d3_systemck[] = {
> + { .n = "ddrck", .p = "masterck", .id = 2 },
> + { .n = "lcdck", .p = "masterck", .id = 3 },
> + { .n = "smdck", .p = "smdclk",   .id = 4 },
> + { .n = "uhpck", .p = "usbck",.id = 6 },
> + { .n = "udpck", .p = "usbck",.id = 7 },
> + { .n = "pck0",  .p = "prog0",.id = 8 },
> + { .n = "pck1",  .p = "prog1",.id = 9 },
> + { .n = "pck2",  .p = "prog2",.id = 10 },
> +};
> +
> +static const struct {
> + char *n;
> + u8 id;
> + struct clk_range r;
> +} sama5d3_periphck[] = {
> + { .n = "dbgu_clk", .id = 2, },
> + { .n = "hsmc_clk", .id = 5, },
> + { .n = "pioA_clk", .id = 6, },
> + { .n = "pioB_clk", .id = 7, },
> + { .n = "pioC_clk", .id = 8, },
> + { .n = "pioD_clk", .id = 9, },
> + { .n = "pioE_clk", .id = 10, },
> + { .n = "usart0_clk", .id = 12, .r = { .min = 0, .max = 8300 }, },
> + { .n = "usart1_clk", .id = 13, .r = { .min = 0, .max = 8300 }, },
> + { .n = "usart2_clk", .id = 14, .r = { .min = 0, .max = 8300 }, },
> + { .n = "usart3_clk", .id = 15, .r = { .min = 0, .max = 8300 }, },
> + { .n = "uart0_clk", .id = 16, .r = { .min = 0, .max = 8300 }, },
> + { .n = "uart1_clk", .id = 17, .r = { .min = 0, .max = 8300 }, },
> + { .n = "twi0_clk", .id = 18, .r = { .min = 0, .max = 4150 }, },
> + { .n = "twi1_clk", .id = 19, .r = { .min = 0, .max = 4150 }, },
> + { .n = "twi2_clk", .id = 20, .r = { .min = 0, .max = 4150 }, },
> + { .n = "mci0_clk", .id = 21, },
> + { .n = "mci1_clk", .id = 22, },
> + { .n = "mci2_clk", .id = 23, },
> + { .n = "spi0_clk", .id = 24, .r = { .min = 0, .max = 16600 }, },
> + { .n = "spi1_clk", .id = 25, .r = { .min = 0, .max = 16600 }, },
> + { .n = "tcb0_clk", .id = 26, .r = { .min = 0, .max = 16600 }, },
> + { .n = "tcb1_clk", .id = 27, .r = { .min = 0, .max = 16600 }, },
> + { .n = "pwm_clk", .id = 28, },
> + { .n = "adc_clk", .id = 29, .r = { .min = 0, .max = 8300 }, },
> + { .n = "dma0_clk", .id = 30, },
> + { .n = "dma1_clk", .id = 31, },
> + { .n = "uhphs_clk", .id = 32, },
> + { .n = "udphs_clk", .id = 33, },
> + { .n = "macb0_clk", .id = 34, },
> + { .n = "macb1_clk", .id = 35, },
> + { .n = "lcdc_clk", .id = 36, },
> + { .n = "isi_clk", .id = 37, },
> + { .n = "ssc0_clk", .id = 38, .r = { .min = 0, .max = 8300 }, },
> + { .n = "ssc1_clk", .id = 39, .r = { .min = 0, .max = 8300 }, },
> + { .n = "can0_clk", .id = 40, .r = { .min = 0, .max = 8300 }, },
> + { .n = "can1_clk", .id = 41, .r = { .min = 0, .max = 8300 }, },
> + { .n =

Re: [PATCH] checkpatch: fix CONST_STRUCT when const_structs.checkpatch is missing

2020-06-22 Thread Joe Perches

On Mon, 2020-06-22 at 21:48 +0100, Quentin Monnet wrote:
> Checkpatch reports warnings when some specific structs are not declared
> as const in the code. The list of structs to consider was initially
> defined in the checkpatch.pl script itself, but it was later moved to an
> external file (scripts/const_structs.checkpatch). This introduced two
> minor issues:
> 
> - When file scripts/const_structs.checkpatch is not present (for
>   example, if checkpatch is run outside of the kernel directory with the
>   "--no-tree" option), a warning is printed to stderr to tell the user
>   that "No structs that should be const will be found". This is fair,
>   but the warning is printed unconditionally, even if the option
>   "--ignore CONST_STRUCT" is passed. In the latter case, we explicitly
>   ask checkpatch to skip this check, so no warning should be printed.
> 
> - When scripts/const_structs.checkpatch is missing, or even when trying
>   to silence the warning by adding an empty file, $const_structs is set
>   to "", and the regex used for finding structs that should be const,
>   "$line =~ /\bstruct\s+($const_structs)\b(?!\s*\{)/)", matches all
>   structs found in the code, thus reporting a number of false positives.
> 
> Let's fix the first item by skipping scripts/const_structs.checkpatch
> processing if "CONST_STRUCT" checks are ignored, and the second one by
> skipping the test if $const_structs is an empty string.
> 
> Fixes: bf1fa1dae68e ("checkpatch: externalize the structs that should be 
> const")

Probably not worthy of a Fixes: line, as that's
generally used for backporting, but OK by me.

> diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
[]
> @@ -781,8 +781,10 @@ sub read_words {
>  }
>  
>  my $const_structs = "";

This might be a tiny bit faster/less cpu using:

my $const_structs;

> -read_words(\$const_structs, $conststructsfile)
> -or warn "No structs that should be const will be found - file 
> '$conststructsfile': $!\n";
> +if (show_type("CONST_STRUCT")) {
> + read_words(\$const_structs, $conststructsfile)
> + or warn "No structs that should be const will be found - file 
> '$conststructsfile': $!\n";
> +}
>  
>  my $typeOtherTypedefs = "";
>  if (length($typedefsfile)) {
> @@ -6660,7 +6662,8 @@ sub process {
>  
>  # check for various structs that are normally const (ops, kgdb, device_tree)
>  # and avoid what seem like struct definitions 'struct foo {'
> - if ($line !~ /\bconst\b/ &&
> + if ($const_structs ne "" &&

instead testing

if (defined($const_structs) &&

> + $line !~ /\bconst\b/ &&
>   $line =~ /\bstruct\s+($const_structs)\b(?!\s*\{)/) {
>   WARN("CONST_STRUCT",
>"struct $1 should normally be const\n" . 
> $herecurr);

[PATCH 3/3] selftests: tpm: Use /bin/sh instead of /bin/bash

2020-06-22 Thread Jarkko Sakkinen

It's better to use /bin/sh instead of /bin/bash in order to run the tests
in the BusyBox shell.

Fixes: 6ea3dfe1e073 ("selftests: add TPM 2.0 tests")
Cc: sta...@vger.kernel.org
Cc: linux-integr...@vger.kernel.org
Cc: linux-kselft...@vger.kernel.org
Signed-off-by: Jarkko Sakkinen 
---
 tools/testing/selftests/tpm2/test_smoke.sh | 2 +-
 tools/testing/selftests/tpm2/test_space.sh | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/tpm2/test_smoke.sh 
b/tools/testing/selftests/tpm2/test_smoke.sh
index 338d6b0272dc..1334e301d2a0 100755
--- a/tools/testing/selftests/tpm2/test_smoke.sh
+++ b/tools/testing/selftests/tpm2/test_smoke.sh
@@ -1,4 +1,4 @@
-#!/bin/bash
+#!/bin/sh
 # SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause)
 
 # Kselftest framework requirement - SKIP code is 4.
diff --git a/tools/testing/selftests/tpm2/test_space.sh 
b/tools/testing/selftests/tpm2/test_space.sh
index 847cabb20a5f..00259cb746cf 100755
--- a/tools/testing/selftests/tpm2/test_space.sh
+++ b/tools/testing/selftests/tpm2/test_space.sh
@@ -1,4 +1,4 @@
-#!/bin/bash
+#!/bin/sh
 # SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause)
 
 # Kselftest framework requirement - SKIP code is 4.
-- 
2.25.1

Re: [PATCH] isofs: fix High Sierra dirent flag accesses

2020-06-22 Thread Matthew Wilcox

On Sun, Jun 21, 2020 at 07:08:17AM +0300, Egor Chelak wrote:
> The flags byte of the dirent was accessed as de->flags[0] in a couple of
> places, and not as de->flags[-sbi->s_high_sierra], which is how it's
> accessed elsewhere. This caused a bug, where some files on an HSF disc
> could be inaccessible.

> +++ b/fs/isofs/dir.c
> @@ -50,6 +50,7 @@ int isofs_name_translate(struct iso_directory_record *de, 
> char *new, struct inod
>  int get_acorn_filename(struct iso_directory_record *de,
>   char *retname, struct inode *inode)
>  {
> + struct isofs_sb_info *sbi = ISOFS_SB(inode->i_sb);
>   int std;
>   unsigned char *chr;
>   int retnamlen = isofs_name_translate(de, retname, inode);
> @@ -66,7 +67,7 @@ int get_acorn_filename(struct iso_directory_record *de,
>   return retnamlen;
>   if ((*retname == '_') && ((chr[19] & 1) == 1))
>   *retname = '!';
> - if (((de->flags[0] & 2) == 0) && (chr[13] == 0xff)
> + if (((de->flags[-sbi->s_high_sierra] & 2) == 0) && (chr[13] == 0xff)
>   && ((chr[12] & 0xf0) == 0xf0)) {
>   retname[retnamlen] = ',';
>   sprintf(retname+retnamlen+1, "%3.3x",

It's been about 22 years since I contributed the patch which added
support for the Acorn extensions ;-)  But I'm pretty sure that it's not
possible to have an Acorn CD-ROM that is also an HSF CD-ROM.  That is,
all Acorn formatted CD-ROMs are ISO-9660 compatible.  So I think this
chunk of the patch is not required.

[PATCH 2/3] selftests: tpm: Use 'test -e' instead of 'test -f'

2020-06-22 Thread Jarkko Sakkinen

'test -f' is suitable only for *regular* files. Use 'test -e' instead.

Cc: Nikita Sobolev 
Cc: linux-integr...@vger.kernel.org
Cc: linux-kselft...@vger.kernel.org
Fixes: 5627f9cffee7 ("Kernel selftests: Add check if TPM devices are supported")
Signed-off-by: Jarkko Sakkinen 
---
 tools/testing/selftests/tpm2/test_smoke.sh | 2 +-
 tools/testing/selftests/tpm2/test_space.sh | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/tpm2/test_smoke.sh 
b/tools/testing/selftests/tpm2/test_smoke.sh
index 79f8e9da5d21..338d6b0272dc 100755
--- a/tools/testing/selftests/tpm2/test_smoke.sh
+++ b/tools/testing/selftests/tpm2/test_smoke.sh
@@ -4,7 +4,7 @@
 # Kselftest framework requirement - SKIP code is 4.
 ksft_skip=4
 
-[ -f /dev/tpm0 ] || exit $ksft_skip
+[ -e /dev/tpm0 ] || exit $ksft_skip
 
 python -m unittest -v tpm2_tests.SmokeTest
 python -m unittest -v tpm2_tests.AsyncTest
diff --git a/tools/testing/selftests/tpm2/test_space.sh 
b/tools/testing/selftests/tpm2/test_space.sh
index 36c9d030a1c6..847cabb20a5f 100755
--- a/tools/testing/selftests/tpm2/test_space.sh
+++ b/tools/testing/selftests/tpm2/test_space.sh
@@ -4,6 +4,6 @@
 # Kselftest framework requirement - SKIP code is 4.
 ksft_skip=4
 
-[ -f /dev/tpmrm0 ] || exit $ksft_skip
+[ -e /dev/tpmrm0 ] || exit $ksft_skip
 
 python -m unittest -v tpm2_tests.SpaceTest
-- 
2.25.1

[PATCH 1/3] Revert "tpm: selftest: cleanup after unseal with wrong auth/policy test"

2020-06-22 Thread Jarkko Sakkinen

The reverted commit illegitly uses tpm2-tools. External dependencies are
absolutely forbidden from these tests. There is also the problem that
clearing is not necessarily wanted behavior if the test/target computer is
not used only solely for testing.

Fixes: a9920d3bad40 ("tpm: selftest: cleanup after unseal with wrong 
auth/policy test")
Cc: Tadeusz Struk 
Cc: sta...@vger.kernel.org
Cc: linux-integr...@vger.kernel.org
Cc: linux-kselft...@vger.kernel.org
Signed-off-by: Jarkko Sakkinen 
---
 tools/testing/selftests/tpm2/test_smoke.sh | 5 -
 1 file changed, 5 deletions(-)

diff --git a/tools/testing/selftests/tpm2/test_smoke.sh 
b/tools/testing/selftests/tpm2/test_smoke.sh
index 663062701d5a..79f8e9da5d21 100755
--- a/tools/testing/selftests/tpm2/test_smoke.sh
+++ b/tools/testing/selftests/tpm2/test_smoke.sh
@@ -8,8 +8,3 @@ ksft_skip=4
 
 python -m unittest -v tpm2_tests.SmokeTest
 python -m unittest -v tpm2_tests.AsyncTest
-
-CLEAR_CMD=$(which tpm2_clear)
-if [ -n $CLEAR_CMD ]; then
-   tpm2_clear -T device
-fi
-- 
2.25.1

[PATCH 0/3] selftests: tpm: fixes

2020-06-22 Thread Jarkko Sakkinen

A few fixes for tools/testing/selftests/tpm.

Jarkko Sakkinen (3):
  Revert "tpm: selftest: cleanup after unseal with wrong auth/policy
test"
  selftests: tpm: Use 'test -e' instead of 'test -f'
  selftests: tpm: Use /bin/sh instead of /bin/bash

 tools/testing/selftests/tpm2/test_smoke.sh | 9 ++---
 tools/testing/selftests/tpm2/test_space.sh | 4 ++--
 2 files changed, 4 insertions(+), 9 deletions(-)

-- 
2.25.1

Re: [PATCH v2 0/6] kernfs: proposed locking and concurrency improvement

2020-06-22 Thread Rick Lindsley


On 6/22/20 10:53 AM, Tejun Heo wrote:


I don't know. The above highlights the absurdity of the approach itself to
me. You seem to be aware of it too in writing: 250,000 "devices".


Just because it is absurd doesn't mean it wasn't built that way :)

I agree, and I'm trying to influence the next hardware design.  However, what's 
already out there is memory units that must be accessed in 256MB blocks.  If 
you want to remove/add a GB, that's really 4 blocks of memory you're 
manipulating, to the hardware.  Those blocks have to be registered and 
recognized by the kernel for that to work.

Rick

Re: [PATCH v6 17/19] mm: memcg/slab: use a single set of kmem_caches for all allocations

2020-06-22 Thread Roman Gushchin

On Mon, Jun 22, 2020 at 02:04:29PM -0700, Shakeel Butt wrote:
> On Mon, Jun 22, 2020 at 1:37 PM Roman Gushchin  wrote:
> >
> > On Mon, Jun 22, 2020 at 12:21:28PM -0700, Shakeel Butt wrote:
> > > On Mon, Jun 8, 2020 at 4:07 PM Roman Gushchin  wrote:
> > > >
> > > > Instead of having two sets of kmem_caches: one for system-wide and
> > > > non-accounted allocations and the second one shared by all accounted
> > > > allocations, we can use just one.
> > > >
> > > > The idea is simple: space for obj_cgroup metadata can be allocated
> > > > on demand and filled only for accounted allocations.
> > > >
> > > > It allows to remove a bunch of code which is required to handle
> > > > kmem_cache clones for accounted allocations. There is no more need
> > > > to create them, accumulate statistics, propagate attributes, etc.
> > > > It's a quite significant simplification.
> > > >
> > > > Also, because the total number of slab_caches is reduced almost twice
> > > > (not all kmem_caches have a memcg clone), some additional memory
> > > > savings are expected. On my devvm it additionally saves about 3.5%
> > > > of slab memory.
> > > >
> > > > Suggested-by: Johannes Weiner 
> > > > Signed-off-by: Roman Gushchin 
> > > > Reviewed-by: Vlastimil Babka 
> > > > ---
> > > [snip]
> > > >  static inline void memcg_slab_post_alloc_hook(struct kmem_cache *s,
> > > >   struct obj_cgroup *objcg,
> > > > - size_t size, void **p)
> > > > + gfp_t flags, size_t size,
> > > > + void **p)
> > > >  {
> > > > struct page *page;
> > > > unsigned long off;
> > > > size_t i;
> > > >
> > > > +   if (!objcg)
> > > > +   return;
> > > > +
> > > > +   flags &= ~__GFP_ACCOUNT;
> > > > for (i = 0; i < size; i++) {
> > > > if (likely(p[i])) {
> > > > page = virt_to_head_page(p[i]);
> > > > +
> > > > +   if (!page_has_obj_cgroups(page) &&
> > >
> > > The page is already linked into the kmem_cache, don't you need
> > > synchronization for memcg_alloc_page_obj_cgroups().
> >
> > Hm, yes, in theory we need it. I guess the reason behind why I've never 
> > seen any issues
> > here is the SLUB percpu partial list.
> >
> > So in theory we need something like:
> >
> > diff --git a/mm/slab.h b/mm/slab.h
> > index 0a31600a0f5c..44bf57815816 100644
> > --- a/mm/slab.h
> > +++ b/mm/slab.h
> > @@ -237,7 +237,10 @@ static inline int memcg_alloc_page_obj_cgroups(struct 
> > page *page,
> > if (!vec)
> > return -ENOMEM;
> >
> > -   page->obj_cgroups = (struct obj_cgroup **) ((unsigned long)vec | 
> > 0x1UL);
> > +   if (cmpxchg(>obj_cgroups, 0,
> > +   (struct obj_cgroup **) ((unsigned long)vec | 0x1UL)))
> > +   kfree(vec);
> > +
> > return 0;
> >  }
> >
> >
> > But I wonder if we might put it under #ifdef CONFIG_SLAB?
> > Or any other ideas how to make it less expensive?
> >
> > > What's the reason to remove this from charge_slab_page()?
> >
> > Because at charge_slab_page() we don't know if we'll ever need
> > page->obj_cgroups. Some caches might have only few or even zero
> > accounted objects.
> >
> 
> If slab_pre_alloc_hook() returns a non-NULL objcg then we definitely
> need page->obj_cgroups.  The charge_slab_page() happens between
> slab_pre_alloc_hook() & slab_post_alloc_hook(), so, we should be able
> to tell if page->obj_cgroups is needed.

Yes, but the opposite is not always true: we can reuse the existing page
without allocated page->obj_cgroups. In this case charge_slab_page() is
not involved at all.

Or do you mean that we can minimize the amount of required synchronization
by allocating some obj_cgroups vectors from charge_slab_page()?

Re: [PATCH] initrd: Remove erroneous comment

2020-06-22 Thread Tom Rini

On Mon, Jun 22, 2020 at 02:03:28PM -0700, H. Peter Anvin wrote:
> On 2020-06-22 14:01, Tom Rini wrote:
> > 
> > I'm picky here because, well, there's a whole lot of moving parts in the
> > pre-kernel world.  In a strict sense, "UEFI" doesn't do anything with
> > the kernel but based on hpa's comments I assume that at least the
> > in-kernel UEFI stub does what Documentation/x86/booting.rst suggests to
> > do and consumes initrd=/file just like "initrd /file" in extlinux.conf,
> > etc do.  And since the EFI stub is cross-platform, it's worth noting
> > this too.
> 
> For what it's worth, normally boot loaders don't strip this from the
> kernel command line passed to the kernel, although there might be ones
> which do so. In general this is bad practice; it is better to let the
> initrd show in /proc/cmdline.

Strongly agree.

-- 
Tom


signature.asc
Description: PGP signature

Re: [PATCH] initrd: Remove erroneous comment

2020-06-22 Thread Tom Rini

On Mon, Jun 22, 2020 at 01:48:45PM -0700, H. Peter Anvin wrote:
> On 2020-06-22 13:40, Tom Rini wrote:
> > On Mon, Jun 22, 2020 at 01:02:16PM -0700, ron minnich wrote:
> > 
> >> The other thing you ought to consider fixing:
> >> initrd is documented as follows:
> >>
> >> initrd= [BOOT] Specify the location of the initial ramdisk
> >>
> >> for bootloaders only.
> >>
> >> UEFI consumes initrd from the command line as well. As ARM servers
> >> increasingly use UEFI, there may be situations in which the initrd
> >> option doesn't make its way to the kernel? I don't know, UEFI is such
> >> a black box to me. But I've seen this "initrd consumption" happen.
> >>
> >> Based on docs, and the growing use of bootloaders that are happy to
> >> consume initrd= and not pass it to the kernel, you might be better off
> >> trying to move to the new command line option anyway.
> >>
> >> IOW, this comment may not be what people want to see, but ... it might
> >> also be right. Or possibly changed to:
> >>
> >> /*
> >>  * The initrd keyword is in use today on ARM, PowerPC, and MIPS.
> >>  * It is also reserved for use by bootloaders such as UEFI and may
> >>  * be consumed by them and not passed on to the kernel.
> >>  * The documentation also shows it as reserved for bootloaders.
> >>  * It is advised to move to the initrdmem= option whereever possible.
> >>  */
> > 
> > Fair warning, one of the other hats I wear is the chief custodian of the
> > U-Boot project.
> > 
> > Note that on most architectures in modern times the device tree is used
> > to pass in initrd type information and "initrd=" on the command line is
> > quite legacy.
> > 
> > But what do you mean UEFI "consumes" initrd= ?  It's quite expected that
> > when you configure grub/syslinux/systemd-boot/whatever via extlinux.conf
> > or similar with "initrd /some/file" something reasonable happens to
> > read that in to memory and pass along the location to Linux (which can
> > vary from arch to arch, when not using device tree).  I guess looking at 
> > Documentation/x86/boot.rst is where treating initrd= as a file that
> > should be handled and ramdisk_image / ramdisk_size set came from.  I do
> > wonder what happens in the case of ARM/ARM64 + UEFI without device tree.
> > 
> 
> UEFI plus the in-kernel UEFI stub is, in some ways, a "bootloader" in
> the traditional sense. It is totally fair that we should update the
> documentation with this as a different case, though, because it is part
> of the kernel tree and so the kernel now has partial ownership of the
> namespace.
> 
> I suggest "STUB" for "in-kernel firmware stub" for this purpose; no need
> to restrict it to a specific firmware for the purpose of namespace
> reservation.

With a little bit of quick digging, yes, it would be good to document
and be very clear which things are reserved for (and how are treated by)
the in-kernel firmware stub or "kernel EFI stub" or whatever name is
best for drivers/firmware/efi/libstub/.  I forget the last time we tried
booting a linux kernel EFI stub rather than grub/etc over in U-Boot
under our EFI loader support but it's reasonable to expect that it work.
Thanks!

-- 
Tom


signature.asc
Description: PGP signature

Re: [PATCH v6 17/19] mm: memcg/slab: use a single set of kmem_caches for all allocations

2020-06-22 Thread Shakeel Butt

On Mon, Jun 22, 2020 at 1:37 PM Roman Gushchin  wrote:
>
> On Mon, Jun 22, 2020 at 12:21:28PM -0700, Shakeel Butt wrote:
> > On Mon, Jun 8, 2020 at 4:07 PM Roman Gushchin  wrote:
> > >
> > > Instead of having two sets of kmem_caches: one for system-wide and
> > > non-accounted allocations and the second one shared by all accounted
> > > allocations, we can use just one.
> > >
> > > The idea is simple: space for obj_cgroup metadata can be allocated
> > > on demand and filled only for accounted allocations.
> > >
> > > It allows to remove a bunch of code which is required to handle
> > > kmem_cache clones for accounted allocations. There is no more need
> > > to create them, accumulate statistics, propagate attributes, etc.
> > > It's a quite significant simplification.
> > >
> > > Also, because the total number of slab_caches is reduced almost twice
> > > (not all kmem_caches have a memcg clone), some additional memory
> > > savings are expected. On my devvm it additionally saves about 3.5%
> > > of slab memory.
> > >
> > > Suggested-by: Johannes Weiner 
> > > Signed-off-by: Roman Gushchin 
> > > Reviewed-by: Vlastimil Babka 
> > > ---
> > [snip]
> > >  static inline void memcg_slab_post_alloc_hook(struct kmem_cache *s,
> > >   struct obj_cgroup *objcg,
> > > - size_t size, void **p)
> > > + gfp_t flags, size_t size,
> > > + void **p)
> > >  {
> > > struct page *page;
> > > unsigned long off;
> > > size_t i;
> > >
> > > +   if (!objcg)
> > > +   return;
> > > +
> > > +   flags &= ~__GFP_ACCOUNT;
> > > for (i = 0; i < size; i++) {
> > > if (likely(p[i])) {
> > > page = virt_to_head_page(p[i]);
> > > +
> > > +   if (!page_has_obj_cgroups(page) &&
> >
> > The page is already linked into the kmem_cache, don't you need
> > synchronization for memcg_alloc_page_obj_cgroups().
>
> Hm, yes, in theory we need it. I guess the reason behind why I've never seen 
> any issues
> here is the SLUB percpu partial list.
>
> So in theory we need something like:
>
> diff --git a/mm/slab.h b/mm/slab.h
> index 0a31600a0f5c..44bf57815816 100644
> --- a/mm/slab.h
> +++ b/mm/slab.h
> @@ -237,7 +237,10 @@ static inline int memcg_alloc_page_obj_cgroups(struct 
> page *page,
> if (!vec)
> return -ENOMEM;
>
> -   page->obj_cgroups = (struct obj_cgroup **) ((unsigned long)vec | 
> 0x1UL);
> +   if (cmpxchg(>obj_cgroups, 0,
> +   (struct obj_cgroup **) ((unsigned long)vec | 0x1UL)))
> +   kfree(vec);
> +
> return 0;
>  }
>
>
> But I wonder if we might put it under #ifdef CONFIG_SLAB?
> Or any other ideas how to make it less expensive?
>
> > What's the reason to remove this from charge_slab_page()?
>
> Because at charge_slab_page() we don't know if we'll ever need
> page->obj_cgroups. Some caches might have only few or even zero
> accounted objects.
>

If slab_pre_alloc_hook() returns a non-NULL objcg then we definitely
need page->obj_cgroups.  The charge_slab_page() happens between
slab_pre_alloc_hook() & slab_post_alloc_hook(), so, we should be able
to tell if page->obj_cgroups is needed.

Re: [RFC] MFD's relationship with Device Tree (OF)

2020-06-22 Thread Michael Walle


Am 2020-06-14 12:26, schrieb Michael Walle:

Hi Rob,

Am 2020-06-10 00:03, schrieb Rob Herring:
[..]

Yes, we should use 'reg' whenever possible. If we don't have 'reg',
then you shouldn't have a unit-address either and you can simply match
on the node name (standard DT driver matching is with compatible,
device_type, and node name (w/o unit-address)). We've generally been
doing 'classname-N' when there's no 'reg' to do 'classname@N'.
Matching on 'classname-N' would work with node name matching as only
unit-addresses are stripped.


This still keeps me thinking. Shouldn't we allow the (MFD!) device
driver creator to choose between "classname@N" and "classname-N".
In most cases N might not be made up, but it is arbitrarily chosen;
for example you've chosen the bank for the ab8500 reg. It is not
a defined entity, like an I2C address if your parent is an I2C bus,
or a SPI chip select, or the memory address in case of MMIO. Instead
the device driver creator just chooses some "random" property from
the datasheet; another device creator might have chosen another
property. Wouldn't it make more sense, to just say this MFD provides
N pwm devices and the subnodes are matching based on pwm-{0,1..N-1}?
That would also be the logical consequence of the current MFD sub
device to OF node matching code, which just supports N=1.



Rob? Lee?

-michael

Re: [PATCH v2 04/16] b43: Remove uninitialized_var() usage

2020-06-22 Thread Kees Cook

On Mon, Jun 22, 2020 at 10:04:18AM -0700, Nick Desaulniers wrote:
> On Fri, Jun 19, 2020 at 8:30 PM Kees Cook  wrote:
> >
> > Using uninitialized_var() is dangerous as it papers over real bugs[1]
> > (or can in the future), and suppresses unrelated compiler warnings (e.g.
> > "unused variable"). If the compiler thinks it is uninitialized, either
> > simply initialize the variable or make compiler changes. As a precursor
> > to removing[2] this[3] macro[4], just initialize this variable to NULL.
> > No later NULL deref is possible due to the early returns outside of the
> > (phy->rev >= 7 && phy->rev < 19) case, which explicitly tests for NULL.
> >
> > [1] https://lore.kernel.org/lkml/20200603174714.192027-1-gli...@google.com/
> > [2] 
> > https://lore.kernel.org/lkml/CA+55aFw+Vbj0i=1tgqcr5vqkczwj0qxk6cernou6eedsuda...@mail.gmail.com/
> > [3] 
> > https://lore.kernel.org/lkml/ca+55afwgbgqhbp1fkxvrkepzyr5j8n1vkt1vzdz9knmpuxh...@mail.gmail.com/
> > [4] 
> > https://lore.kernel.org/lkml/CA+55aFz2500WfbKXAx8s67wrm9=yvju65tplgn_ybynv0ve...@mail.gmail.com/
> >
> > Fixes: 58619b14d106 ("b43: move under broadcom vendor directory")
> > Signed-off-by: Kees Cook 
> 
> I see three total uses of uninitialized_var() in this file, do we want
> to eliminate all of them?

This is the only one that needed an explicit initialization -- all the
others are handled in the treewide patch. I *could* split it out here,
but I found it easier to keep the "no op" changes together in the
treewide patch.

-Kees

> 
> > ---
> >  drivers/net/wireless/broadcom/b43/phy_n.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/net/wireless/broadcom/b43/phy_n.c 
> > b/drivers/net/wireless/broadcom/b43/phy_n.c
> > index c33b4235839d..46db91846007 100644
> > --- a/drivers/net/wireless/broadcom/b43/phy_n.c
> > +++ b/drivers/net/wireless/broadcom/b43/phy_n.c
> > @@ -4222,7 +4222,7 @@ static void b43_nphy_tx_gain_table_upload(struct 
> > b43_wldev *dev)
> > u32 rfpwr_offset;
> > u8 pga_gain, pad_gain;
> > int i;
> > -   const s16 *uninitialized_var(rf_pwr_offset_table);
> > +   const s16 *rf_pwr_offset_table = NULL;
> >
> > table = b43_nphy_get_tx_gain_table(dev);
> > if (!table)
> > --
> 
> -- 
> Thanks,
> ~Nick Desaulniers

-- 
Kees Cook

Re: [PATCH] initrd: Remove erroneous comment

2020-06-22 Thread H. Peter Anvin

On 2020-06-22 14:01, Tom Rini wrote:
> 
> I'm picky here because, well, there's a whole lot of moving parts in the
> pre-kernel world.  In a strict sense, "UEFI" doesn't do anything with
> the kernel but based on hpa's comments I assume that at least the
> in-kernel UEFI stub does what Documentation/x86/booting.rst suggests to
> do and consumes initrd=/file just like "initrd /file" in extlinux.conf,
> etc do.  And since the EFI stub is cross-platform, it's worth noting
> this too.
> 

For what it's worth, normally boot loaders don't strip this from the
kernel command line passed to the kernel, although there might be ones
which do so. In general this is bad practice; it is better to let the
initrd show in /proc/cmdline.

-hpa

Re: [PATCH] initrd: Remove erroneous comment

2020-06-22 Thread Tom Rini

On Mon, Jun 22, 2020 at 01:56:24PM -0700, ron minnich wrote:

> So, let me first add,  the comment can be removed as needed. Comments
> offered only for clarification.

Noted, thanks.

> On Mon, Jun 22, 2020 at 1:40 PM Tom Rini  wrote:
> 
> > But what do you mean UEFI "consumes" initrd= ?
> 
> What I mean is, there are bootloaders that will, if they see initrd=
> in the command line, remove it: the kernel will never see it.

I'm picky here because, well, there's a whole lot of moving parts in the
pre-kernel world.  In a strict sense, "UEFI" doesn't do anything with
the kernel but based on hpa's comments I assume that at least the
in-kernel UEFI stub does what Documentation/x86/booting.rst suggests to
do and consumes initrd=/file just like "initrd /file" in extlinux.conf,
etc do.  And since the EFI stub is cross-platform, it's worth noting
this too.

> >  I guess looking at
> > Documentation/x86/boot.rst is where treating initrd= as a file that
> > should be handled and ramdisk_image / ramdisk_size set came from.  I do
> > wonder what happens in the case of ARM/ARM64 + UEFI without device tree.
> 
> it is possible that the initrd= argument will not be seen by the
> kernel. That's my understanding. Will this be a problem if so? It
> would be for me :-)
> 
> >  And it doesn't provide any sort of link / context to the
> > boot loader specification project or similar that explains the cases
> > when a non-filename "initrd=" would reasonably (or unreasonably but
> > happens in reality) be removed.
> 
> But it unreasonably happens as I learned the hard way :-)
> 
> Anyway, thanks Tom, I have no objections to whatever you all feel is
> best to do with that comment. It was a failed attempt on my part to
> explain the state of things :-)

Booting up the kernel is quite the "fun" area indeed.

-- 
Tom


signature.asc
Description: PGP signature

Re: [PATCH 3/5] Huawei BMA: Adding Huawei BMA driver: host_veth_drv

2020-06-22 Thread kernel test robot

Hi,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on sparc-next/master]
[also build test WARNING on linux/master linus/master v5.8-rc2 next-20200622]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use  as documented in
https://git-scm.com/docs/git-format-patch]

url:
https://github.com/0day-ci/linux/commits/yunaixin03610-163-com/Adding-Huawei-BMA-drivers/20200623-014140
base:   https://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-next.git 
master
config: sh-allmodconfig (attached as .config)
compiler: sh4-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=sh 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All warnings (new ones prefixed by >>, old ones prefixed by <<):

drivers/net/ethernet/huawei/bma/veth_drv/veth_hb.c:394:5: warning: no previous 
prototype for 'bspveth_setup_tx_resources' [-Wmissing-prototypes]
394 | s32 bspveth_setup_tx_resources(struct bspveth_device *pvethdev,
| ^~
In file included from arch/sh/include/asm/thread_info.h:15,
from include/linux/thread_info.h:38,
from include/asm-generic/preempt.h:5,
from ./arch/sh/include/generated/asm/preempt.h:1,
from include/linux/preempt.h:78,
from include/linux/spinlock.h:51,
from include/linux/seqlock.h:36,
from include/linux/time.h:6,
from include/linux/stat.h:19,
from include/linux/module.h:13,
from drivers/net/ethernet/huawei/bma/veth_drv/veth_hb.c:18:
drivers/net/ethernet/huawei/bma/veth_drv/veth_hb.c: In function 
'bspveth_setup_tx_resources':
drivers/net/ethernet/huawei/bma/veth_drv/veth_hb.c:427:37: warning: cast from 
pointer to integer of different size [-Wpointer-to-int-cast]
427 |  ptx_queue->pbdbase_p = (u8 *)(__pa((BSP_VETH_T)(ptx_queue->pbdbase_v)));
| ^
arch/sh/include/asm/page.h:138:20: note: in definition of macro '___pa'
138 | #define ___pa(x) ((x)-PAGE_OFFSET)
|^
>> drivers/net/ethernet/huawei/bma/veth_drv/veth_hb.c:427:32: note: in 
>> expansion of macro '__pa'
427 |  ptx_queue->pbdbase_p = (u8 *)(__pa((BSP_VETH_T)(ptx_queue->pbdbase_v)));
|^~~~
drivers/net/ethernet/huawei/bma/veth_drv/veth_hb.c: At top level:
drivers/net/ethernet/huawei/bma/veth_drv/veth_hb.c:443:6: warning: no previous 
prototype for 'bspveth_free_tx_resources' [-Wmissing-prototypes]
443 | void bspveth_free_tx_resources(struct bspveth_device *pvethdev,
|  ^
drivers/net/ethernet/huawei/bma/veth_drv/veth_hb.c:485:5: warning: no previous 
prototype for 'bspveth_setup_all_tx_resources' [-Wmissing-prototypes]
485 | s32 bspveth_setup_all_tx_resources(struct bspveth_device *pvethdev)
| ^~
drivers/net/ethernet/huawei/bma/veth_drv/veth_hb.c: In function 
'bspveth_setup_all_tx_resources':
drivers/net/ethernet/huawei/bma/veth_drv/veth_hb.c:514:33: warning: cast from 
pointer to integer of different size [-Wpointer-to-int-cast]
514 |(struct bspveth_dma_shmbd *)((BSP_VETH_T)(shmq_head)
| ^
drivers/net/ethernet/huawei/bma/veth_drv/veth_hb.c:514:4: warning: cast to 
pointer from integer of different size [-Wint-to-pointer-cast]
514 |(struct bspveth_dma_shmbd *)((BSP_VETH_T)(shmq_head)
|^
drivers/net/ethernet/huawei/bma/veth_drv/veth_hb.c:517:11: warning: cast from 
pointer to integer of different size [-Wpointer-to-int-cast]
517 |(u8 *)((BSP_VETH_T)(shmq_head_p)
|   ^
drivers/net/ethernet/huawei/bma/veth_drv/veth_hb.c:517:4: warning: cast to 
pointer from integer of different size [-Wint-to-pointer-cast]
517 |(u8 *)((BSP_VETH_T)(shmq_head_p)
|^
drivers/net/ethernet/huawei/bma/veth_drv/veth_hb.c:520:28: warning: cast from 
pointer to integer of different size [-Wpointer-to-int-cast]
520 |(struct bspveth_dmal *)((BSP_VETH_T)(shmq_head)
|^
drivers/net/ethernet/huawei/bma/veth_drv/veth_hb.c:520:4: warning: cast to 
pointer from integer of different size [-Wint-to-pointer-cast]
520 |(struct bspveth_dmal *)((BSP_VETH_T)(shmq_head)
|^
drivers/net/ethernet/huawei/bma/veth_drv/veth_hb.c:523:4: warning: cast to 
pointer from integer of different size [-Wint-to-pointer-cast]
523 |(u8 *)(u64)(VETH_SHAREPOOL_BASE_INBMC +
|^
drivers/net/ethernet/huawei/bma/veth_drv/veth_hb.c: At top level:
drivers/net/ethernet/huawei/bma/veth_drv/veth_hb.c:554:6: warning: no previous 
prototype for 'bspveth_free_all_tx_resources' [-Wmissing-prototypes]
554 | void bspveth_free_all_tx_resources(struct bspveth_device *pvethdev)
|  ^

Re: [PATCH] binder: fix null deref of proc->context

2020-06-22 Thread Todd Kjos

On Mon, Jun 22, 2020 at 1:18 PM Todd Kjos  wrote:
>
> On Mon, Jun 22, 2020 at 1:09 PM Christian Brauner
>  wrote:
> >
> > On Mon, Jun 22, 2020 at 01:07:15PM -0700, Todd Kjos wrote:
> > > The binder driver makes the assumption proc->context pointer is invariant 
> > > after
> > > initialization (as documented in the kerneldoc header for struct proc).
> > > However, in commit f0fe2c0f050d ("binder: prevent UAF for binderfs 
> > > devices II")
> > > proc->context is set to NULL during binder_deferred_release().
> > >
> > > Another proc was in the middle of setting up a transaction to the dying
> > > process and crashed on a NULL pointer deref on "context" which is a local
> > > set to >context:
> > >
> > > new_ref->data.desc = (node == context->binder_context_mgr_node) ? 0 : 
> > > 1;
> > >
> > > Here's the stack:
> > >
> > > [ 5237.855435] Call trace:
> > > [ 5237.855441] binder_get_ref_for_node_olocked+0x100/0x2ec
> > > [ 5237.855446] binder_inc_ref_for_node+0x140/0x280
> > > [ 5237.855451] binder_translate_binder+0x1d0/0x388
> > > [ 5237.855456] binder_transaction+0x2228/0x3730
> > > [ 5237.855461] binder_thread_write+0x640/0x25bc
> > > [ 5237.855466] binder_ioctl_write_read+0xb0/0x464
> > > [ 5237.855471] binder_ioctl+0x30c/0x96c
> > > [ 5237.855477] do_vfs_ioctl+0x3e0/0x700
> > > [ 5237.855482] __arm64_sys_ioctl+0x78/0xa4
> > > [ 5237.855488] el0_svc_common+0xb4/0x194
> > > [ 5237.855493] el0_svc_handler+0x74/0x98
> > > [ 5237.855497] el0_svc+0x8/0xc
> > >
> > > The fix is to move the kfree of the binder_device to binder_free_proc()
> > > so the binder_device is freed when we know there are no references
> > > remaining on the binder_proc.
> > >
> > > Fixes: f0fe2c0f050d ("binder: prevent UAF for binderfs devices II")
> > > Signed-off-by: Todd Kjos 
>
> Forgot to include stable. The issue was introduced in 5.6, so fix needed in 
> 5.7.
> Cc: sta...@vger.kernel.org # 5.7

Turns out the patch with the issue was also backported to 5.4.y, so
the fix is needed there too.

>
>
> >
> >
> > Thanks, looks good to me!
> > Acked-by: Christian Brauner 
> >
> > Christian
> >
> > > ---
> > >  drivers/android/binder.c | 14 +++---
> > >  1 file changed, 7 insertions(+), 7 deletions(-)
> > >
> > > diff --git a/drivers/android/binder.c b/drivers/android/binder.c
> > > index e47c8a4c83db..f50c5f182bb5 100644
> > > --- a/drivers/android/binder.c
> > > +++ b/drivers/android/binder.c
> > > @@ -4686,8 +4686,15 @@ static struct binder_thread 
> > > *binder_get_thread(struct binder_proc *proc)
> > >
> > >  static void binder_free_proc(struct binder_proc *proc)
> > >  {
> > > + struct binder_device *device;
> > > +
> > >   BUG_ON(!list_empty(>todo));
> > >   BUG_ON(!list_empty(>delivered_death));
> > > + device = container_of(proc->context, struct binder_device, context);
> > > + if (refcount_dec_and_test(>ref)) {
> > > + kfree(proc->context->name);
> > > + kfree(device);
> > > + }
> > >   binder_alloc_deferred_release(>alloc);
> > >   put_task_struct(proc->tsk);
> > >   binder_stats_deleted(BINDER_STAT_PROC);
> > > @@ -5406,7 +5413,6 @@ static int binder_node_release(struct binder_node 
> > > *node, int refs)
> > >  static void binder_deferred_release(struct binder_proc *proc)
> > >  {
> > >   struct binder_context *context = proc->context;
> > > - struct binder_device *device;
> > >   struct rb_node *n;
> > >   int threads, nodes, incoming_refs, outgoing_refs, 
> > > active_transactions;
> > >
> > > @@ -5423,12 +5429,6 @@ static void binder_deferred_release(struct 
> > > binder_proc *proc)
> > >   context->binder_context_mgr_node = NULL;
> > >   }
> > >   mutex_unlock(>context_mgr_node_lock);
> > > - device = container_of(proc->context, struct binder_device, context);
> > > - if (refcount_dec_and_test(>ref)) {
> > > - kfree(context->name);
> > > - kfree(device);
> > > - }
> > > - proc->context = NULL;
> > >   binder_inner_proc_lock(proc);
> > >   /*
> > >* Make sure proc stays alive after we
> > > --
> > > 2.27.0.111.gc72c7da667-goog
> > >

[PATCH v2 0/2] arm64: Warn on orphan section placement

2020-06-22 Thread Kees Cook

v2:
- split by architecture, rebase to v5.8-rc2
v1: https://lore.kernel.org/lkml/20200228002244.15240-1-keesc...@chromium.org/

A recent bug[1] was solved for builds linked with ld.lld, and tracking
it down took way longer than it needed to (a year). Ultimately, it
boiled down to differences between ld.bfd and ld.lld's handling of
orphan sections. Similarly, the recent FGKASLR series brough up orphan
section handling too[2]. In both cases, it would have been nice if the
linker was running with --orphan-handling=warn so that surprise sections
wouldn't silently get mapped into the kernel image at locations up to the
whim of the linker's orphan handling logic. Instead, all desired sections
should be explicitly identified in the linker script (to be either kept or
discarded) with any orphans throwing a warning. The powerpc architecture
actually already does this, so this series extends coverage to arm64.

This series needs one additional commit that is not yet in
any tree, but I hope to have it landed via x86 -tip shortly:
https://lore.kernel.org/lkml/20200622205341.2987797-2-keesc...@chromium.org

Thanks!

-Kees

[1] https://github.com/ClangBuiltLinux/linux/issues/282
[2] https://lore.kernel.org/lkml/202002242122.AA4D1B8@keescook/

Kees Cook (2):
  arm64/build: Use common DISCARDS in linker script
  arm64/build: Warn on orphan section placement

 arch/arm64/Makefile |  4 
 arch/arm64/kernel/vmlinux.lds.S | 10 ++
 2 files changed, 10 insertions(+), 4 deletions(-)

-- 
2.25.1

[PATCH v2 2/2] arm64/build: Warn on orphan section placement

2020-06-22 Thread Kees Cook

We don't want to depend on the linker's orphan section placement
heuristics as these can vary between linkers, and may change between
versions. All sections need to be explicitly named in the linker
script.

Explicitly include debug sections when they're present. Add .eh_frame*
to discard as it seems that these are still generated even though
-fno-asynchronous-unwind-tables is being specified. Add .plt and
.data.rel.ro to discards as they are not actually used. Add .got.plt
to the image as it does appear to be mapped near .data. Finally enable
orphan section warnings.

Signed-off-by: Kees Cook 
---
 arch/arm64/Makefile | 4 
 arch/arm64/kernel/vmlinux.lds.S | 5 -
 2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/Makefile b/arch/arm64/Makefile
index a0d94d063fa8..3e628983445a 100644
--- a/arch/arm64/Makefile
+++ b/arch/arm64/Makefile
@@ -29,6 +29,10 @@ LDFLAGS_vmlinux  += --fix-cortex-a53-843419
   endif
 endif
 
+# We never want expected sections to be placed heuristically by the
+# linker. All sections should be explicitly named in the linker script.
+LDFLAGS_vmlinux += --orphan-handling=warn
+
 ifeq ($(CONFIG_ARM64_USE_LSE_ATOMICS), y)
   ifneq ($(CONFIG_ARM64_LSE_ATOMICS), y)
 $(warning LSE atomics not supported by binutils)
diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
index 5427f502c3a6..c9ecb3b2007d 100644
--- a/arch/arm64/kernel/vmlinux.lds.S
+++ b/arch/arm64/kernel/vmlinux.lds.S
@@ -94,7 +94,8 @@ SECTIONS
/DISCARD/ : {
*(.interp .dynamic)
*(.dynsym .dynstr .hash .gnu.hash)
-   *(.eh_frame)
+   *(.plt) *(.data.rel.ro)
+   *(.eh_frame) *(.init.eh_frame)
}
 
. = KIMAGE_VADDR + TEXT_OFFSET;
@@ -209,6 +210,7 @@ SECTIONS
_data = .;
_sdata = .;
RW_DATA(L1_CACHE_BYTES, PAGE_SIZE, THREAD_ALIGN)
+   .got.plt : ALIGN(8) { *(.got.plt) }
 
/*
 * Data written with the MMU off but read with the MMU on requires
@@ -244,6 +246,7 @@ SECTIONS
_end = .;
 
STABS_DEBUG
+   DWARF_DEBUG
 
HEAD_SYMBOLS
 }
-- 
2.25.1

[PATCH v2 1/2] arm64/build: Use common DISCARDS in linker script

2020-06-22 Thread Kees Cook

Use the common DISCARDS rule for the linker script in an effort to
regularize the linker script to prepare for warning on orphaned
sections. Additionally clean up left-over no-op macros.

Signed-off-by: Kees Cook 
Acked-by: Will Deacon 
---
 arch/arm64/kernel/vmlinux.lds.S | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
index 6827da7f3aa5..5427f502c3a6 100644
--- a/arch/arm64/kernel/vmlinux.lds.S
+++ b/arch/arm64/kernel/vmlinux.lds.S
@@ -6,6 +6,7 @@
  */
 
 #define RO_EXCEPTION_TABLE_ALIGN   8
+#define RUNTIME_DISCARD_EXIT
 
 #include 
 #include 
@@ -89,10 +90,8 @@ SECTIONS
 * matching the same input section name.  There is no documented
 * order of matching.
 */
+   DISCARDS
/DISCARD/ : {
-   EXIT_CALL
-   *(.discard)
-   *(.discard.*)
*(.interp .dynamic)
*(.dynsym .dynstr .hash .gnu.hash)
*(.eh_frame)
-- 
2.25.1

Re: [PATCH] initrd: Remove erroneous comment

2020-06-22 Thread ron minnich

So, let me first add,  the comment can be removed as needed. Comments
offered only for clarification.

On Mon, Jun 22, 2020 at 1:40 PM Tom Rini  wrote:

> But what do you mean UEFI "consumes" initrd= ?

What I mean is, there are bootloaders that will, if they see initrd=
in the command line, remove it: the kernel will never see it.

>  I guess looking at
> Documentation/x86/boot.rst is where treating initrd= as a file that
> should be handled and ramdisk_image / ramdisk_size set came from.  I do
> wonder what happens in the case of ARM/ARM64 + UEFI without device tree.

it is possible that the initrd= argument will not be seen by the
kernel. That's my understanding. Will this be a problem if so? It
would be for me :-)

>  And it doesn't provide any sort of link / context to the
> boot loader specification project or similar that explains the cases
> when a non-filename "initrd=" would reasonably (or unreasonably but
> happens in reality) be removed.

But it unreasonably happens as I learned the hard way :-)

Anyway, thanks Tom, I have no objections to whatever you all feel is
best to do with that comment. It was a failed attempt on my part to
explain the state of things :-)

ron

[PATCH v2 3/3] x86/boot: Warn on orphan section placement

2020-06-22 Thread Kees Cook

We don't want to depend on the linker's orphan section placement
heuristics as these can vary between linkers, and may change between
versions. All sections need to be explicitly named in the linker
script.

Add the common debugging sections. Discard the unused note, rel, plt,
dyn, and hash sections that are not needed in the compressed vmlinux.
Disable .eh_frame generation in the linker and enable orphan section
warnings.

Signed-off-by: Kees Cook 
---
 arch/x86/boot/compressed/Makefile  |  3 ++-
 arch/x86/boot/compressed/vmlinux.lds.S | 11 +++
 2 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/arch/x86/boot/compressed/Makefile 
b/arch/x86/boot/compressed/Makefile
index 7619742f91c9..646720a05f89 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -48,6 +48,7 @@ GCOV_PROFILE := n
 UBSAN_SANITIZE :=n
 
 KBUILD_LDFLAGS := -m elf_$(UTS_MACHINE)
+KBUILD_LDFLAGS += $(call ld-option,--no-ld-generated-unwind-info)
 # Compressed kernel should be built as PIE since it may be loaded at any
 # address by the bootloader.
 ifeq ($(CONFIG_X86_32),y)
@@ -59,7 +60,7 @@ else
 KBUILD_LDFLAGS += $(shell $(LD) --help 2>&1 | grep -q "\-z noreloc-overflow" \
&& echo "-z noreloc-overflow -pie --no-dynamic-linker")
 endif
-LDFLAGS_vmlinux := -T
+LDFLAGS_vmlinux := --orphan-handling=warn -T
 
 hostprogs  := mkpiggy
 HOST_EXTRACFLAGS += -I$(srctree)/tools/include
diff --git a/arch/x86/boot/compressed/vmlinux.lds.S 
b/arch/x86/boot/compressed/vmlinux.lds.S
index 8f1025d1f681..6fe3ecdfd685 100644
--- a/arch/x86/boot/compressed/vmlinux.lds.S
+++ b/arch/x86/boot/compressed/vmlinux.lds.S
@@ -75,5 +75,16 @@ SECTIONS
. = ALIGN(PAGE_SIZE);   /* keep ZO size page aligned */
_end = .;
 
+   STABS_DEBUG
+   DWARF_DEBUG
+
DISCARDS
+   /DISCARD/ : {
+   *(.note.*)
+   *(.rela.*) *(.rela_*)
+   *(.rel.*) *(.rel_*)
+   *(.plt) *(.plt.*)
+   *(.dyn*)
+   *(.hash) *(.gnu.hash)
+   }
 }
-- 
2.25.1

[PATCH v2 2/3] x86/build: Warn on orphan section placement

2020-06-22 Thread Kees Cook

We don't want to depend on the linker's orphan section placement
heuristics as these can vary between linkers, and may change between
versions. All sections need to be explicitly named in the linker
script.

Discards the unused rela, plt, and got sections that are not needed
in the final vmlinux, and enable orphan section warnings.

Signed-off-by: Kees Cook 
---
 arch/x86/Makefile | 4 
 arch/x86/kernel/vmlinux.lds.S | 6 ++
 2 files changed, 10 insertions(+)

diff --git a/arch/x86/Makefile b/arch/x86/Makefile
index 00e378de8bc0..f8a5b2333729 100644
--- a/arch/x86/Makefile
+++ b/arch/x86/Makefile
@@ -51,6 +51,10 @@ ifdef CONFIG_X86_NEED_RELOCS
 LDFLAGS_vmlinux := --emit-relocs --discard-none
 endif
 
+# We never want expected sections to be placed heuristically by the
+# linker. All sections should be explicitly named in the linker script.
+LDFLAGS_vmlinux += --orphan-handling=warn
+
 #
 # Prevent GCC from generating any FP code by mistake.
 #
diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index 3bfc8dd8a43d..bb085ceeaaad 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -412,6 +412,12 @@ SECTIONS
DWARF_DEBUG
 
DISCARDS
+   /DISCARD/ : {
+   *(.rela.*) *(.rela_*)
+   *(.rel.*) *(.rel_*)
+   *(.got) *(.got.*)
+   *(.igot.*) *(.iplt)
+   }
 }
 
 
-- 
2.25.1

[PATCH v2 0/3] x86: Warn on orphan section placement

2020-06-22 Thread Kees Cook

v2:
- split by architecture, rebase to v5.8-rc2
v1: https://lore.kernel.org/lkml/20200228002244.15240-1-keesc...@chromium.org/

A recent bug[1] was solved for builds linked with ld.lld, and tracking
it down took way longer than it needed to (a year). Ultimately, it
boiled down to differences between ld.bfd and ld.lld's handling of
orphan sections. Similarly, the recent FGKASLR series brough up orphan
section handling too[2]. In both cases, it would have been nice if the
linker was running with --orphan-handling=warn so that surprise sections
wouldn't silently get mapped into the kernel image at locations up to the
whim of the linker's orphan handling logic. Instead, all desired sections
should be explicitly identified in the linker script (to be either kept or
discarded) with any orphans throwing a warning. The powerpc architecture
actually already does this, so this series extends coverage to x86.

Thanks!

-Kees

[1] https://github.com/ClangBuiltLinux/linux/issues/282
[2] https://lore.kernel.org/lkml/202002242122.AA4D1B8@keescook/

Kees Cook (3):
  vmlinux.lds.h: Add .gnu.version* to DISCARDS
  x86/build: Warn on orphan section placement
  x86/boot: Warn on orphan section placement

 arch/x86/Makefile  |  4 
 arch/x86/boot/compressed/Makefile  |  3 ++-
 arch/x86/boot/compressed/vmlinux.lds.S | 11 +++
 arch/x86/kernel/vmlinux.lds.S  |  6 ++
 include/asm-generic/vmlinux.lds.h  |  1 +
 5 files changed, 24 insertions(+), 1 deletion(-)

-- 
2.25.1

[PATCH v2 1/3] vmlinux.lds.h: Add .gnu.version* to DISCARDS

2020-06-22 Thread Kees Cook

For vmlinux linking, no architecture uses the .gnu.version* section,
so remove it via the common DISCARDS macro in preparation for adding
--orphan-handling=warn more widely.

Signed-off-by: Kees Cook 
---
 include/asm-generic/vmlinux.lds.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/asm-generic/vmlinux.lds.h 
b/include/asm-generic/vmlinux.lds.h
index db600ef218d7..6fbe9ed10cdb 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -934,6 +934,7 @@
*(.discard) \
*(.discard.*)   \
*(.modinfo) \
+   *(.gnu.version*)\
}
 
 /**
-- 
2.25.1

Re: [PATCH 4.19 182/267] spi: dw: Return any value retrieved from the dma_transfer callback

2020-06-22 Thread Serge Semin

Hello Pavel

On Fri, Jun 19, 2020 at 11:07:19PM +0200, Pavel Machek wrote:
> On Fri 2020-06-19 16:32:47, Greg Kroah-Hartman wrote:
> > From: Serge Semin 
> > 
> > [ Upstream commit f0410bbf7d0fb80149e3b17d11d31f5b5197873e ]
> > 
> > DW APB SSI DMA-part of the driver may need to perform the requested
> > SPI-transfer synchronously. In that case the dma_transfer() callback
> > will return 0 as a marker of the SPI transfer being finished so the
> > SPI core doesn't need to wait and may proceed with the SPI message
> > trasnfers pumping procedure. This will be needed to fix the problem
> > when DMA transactions are finished, but there is still data left in
> > the SPI Tx/Rx FIFOs being sent/received. But for now make dma_transfer
> > to return 1 as the normal dw_spi_transfer_one() method.
> 

> As far as I understand, this is support for new SoC, not a fix?

Not really. That patch is a first one of a series fixing a problem with
SPI transfer completion:
33726eff3d98 spi: dw: Add SPI Rx-done wait method to DMA-based transfer
1ade2d8a72f9 spi: dw: Add SPI Tx-done wait method to DMA-based transfer
bdbdf0f06337 spi: dw: Locally wait for the DMA transfers completion
f0410bbf7d0f spi: dw: Return any value retrieved from the dma_transfer callback

In anyway having just first commit applied is harmless, though pretty much
pointless in fixing the problem it had been originally introduced for. But it
can be useful for something else. See my comment below.

> 
> > +++ b/drivers/spi/spi-dw.c
> > @@ -383,11 +383,8 @@ static int dw_spi_transfer_one(struct spi_controller 
> > *master,
> >  
> > spi_enable_chip(dws, 1);
> >  
> > -   if (dws->dma_mapped) {
> > -   ret = dws->dma_ops->dma_transfer(dws, transfer);
> > -   if (ret < 0)
> > -   return ret;
> > -   }
> > +   if (dws->dma_mapped)
> > +   return dws->dma_ops->dma_transfer(dws, transfer);
> >  
> > if (chip->poll_mode)
> > return poll_transfer(dws);
> 

> Mainline patch simply changes return value, but code is different in
> v4.19, and poll_transfer will now be avoided when dws->dma_mapped. Is
> that a problem?

Actually no.) In that old 4.19 context it's even better to return straight away
no matter what value is returned by the dma_transfer() callback. In the code
without this patch applied, the transfer_one() method will check the poll_mode
flag state even if the dma_transfer() returns a positive value. The positive
value (1) means that the DMA transfer has been executed and the SPI core must
wait for its completion. Needless to say, that if the poll_mode flag state
gets to be true, then a poll-transfer will be executed alongside with the DMA
transfer. Which as you understand will be very wrong. So by having this patch
applied we implicitly fix that problem. Although a probability of the
problematic situation is very low, since the DW APB SSI driver poll-mode hasn't
been utilized by any SPI client driver since long time ago...

-Sergey

> 
> Best regards,
>   Pavel
> -- 
> (english) http://www.livejournal.com/~pavelmachek
> (cesky, pictures) 
> http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

arch/arm64/kernel/acpi.c:99:30: sparse: sparse: incorrect type in return expression (different address spaces)

2020-06-22 Thread kernel test robot

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
master
head:   dd0d718152e4c65b173070d48ea9dfc06894c3e5
commit: 670d0a4b10704667765f7d18f7592993d02783aa sparse: use identifiers to 
define address spaces
date:   4 days ago
config: arm64-randconfig-s031-20200622 (attached as .config)
compiler: aarch64-linux-gcc (GCC) 9.3.0
reproduce:
# apt-get install sparse
# sparse version: v0.6.2-rc1-18-g27caae40-dirty
git checkout 670d0a4b10704667765f7d18f7592993d02783aa
# save the attached .config to linux build tree
make W=1 C=1 ARCH=arm64 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__'

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 


sparse warnings: (new ones prefixed by >>)

>> arch/arm64/kernel/acpi.c:99:30: sparse: sparse: incorrect type in return 
>> expression (different address spaces) @@ expected void [noderef] __iomem 
>> * @@ got void * @@
>> arch/arm64/kernel/acpi.c:99:30: sparse: expected void [noderef] __iomem *
   arch/arm64/kernel/acpi.c:99:30: sparse: got void *
>> arch/arm64/kernel/acpi.c:107:24: sparse: sparse: incorrect type in argument 
>> 1 (different address spaces) @@ expected void *addr @@ got void 
>> [noderef] __iomem *map @@
   arch/arm64/kernel/acpi.c:107:24: sparse: expected void *addr
>> arch/arm64/kernel/acpi.c:107:24: sparse: got void [noderef] __iomem *map
--
   net/802/mrp.c:864:9: sparse: sparse: cast removes address space '__rcu' of 
expression
>> net/802/mrp.c:864:9: sparse: sparse: incorrect type in argument 1 (different 
>> address spaces) @@ expected void const volatile *p @@ got struct 
>> mrp_applicant [noderef] __rcu *[noderef] __rcu *__p @@
   net/802/mrp.c:864:9: sparse: expected void const volatile *p
>> net/802/mrp.c:864:9: sparse: got struct mrp_applicant [noderef] __rcu 
>> *[noderef] __rcu *__p
   net/802/mrp.c:864:9: sparse: sparse: dereference of noderef expression
   net/802/mrp.c:864:9: sparse: sparse: dereference of noderef expression
--
>> drivers/pci/controller/dwc/pcie-hisi.c:66:37: sparse: sparse: incorrect type 
>> in initializer (different address spaces) @@ expected void [noderef] 
>> __iomem *reg_base @@ got void *priv @@
>> drivers/pci/controller/dwc/pcie-hisi.c:66:37: sparse: expected void 
>> [noderef] __iomem *reg_base
   drivers/pci/controller/dwc/pcie-hisi.c:66:37: sparse: got void *priv
>> drivers/pci/controller/dwc/pcie-hisi.c:103:19: sparse: sparse: incorrect 
>> type in assignment (different address spaces) @@ expected void *priv @@  
>>got void [noderef] __iomem *[assigned] reg_base @@
   drivers/pci/controller/dwc/pcie-hisi.c:103:19: sparse: expected void 
*priv
>> drivers/pci/controller/dwc/pcie-hisi.c:103:19: sparse: got void 
>> [noderef] __iomem *[assigned] reg_base
--
>> drivers/phy/qualcomm/phy-qcom-ufs.c:84:21: sparse: sparse: cast removes 
>> address space '__iomem' of expression
   drivers/phy/qualcomm/phy-qcom-ufs.c:85:32: sparse: sparse: cast removes 
address space '__iomem' of expression
   drivers/phy/qualcomm/phy-qcom-ufs.c:96:21: sparse: sparse: cast removes 
address space '__iomem' of expression
--
>> drivers/firmware/efi/test/efi_test.c:157:13: sparse: sparse: incorrect type 
>> in initializer (different address spaces) @@ expected unsigned long 
>> [noderef] __user *__p @@ got unsigned long *[addressable] data_size @@
>> drivers/firmware/efi/test/efi_test.c:157:13: sparse: expected unsigned 
>> long [noderef] __user *__p
   drivers/firmware/efi/test/efi_test.c:157:13: sparse: got unsigned long 
*[addressable] data_size
   drivers/firmware/efi/test/efi_test.c:160:61: sparse: sparse: incorrect type 
in argument 2 (different address spaces) @@ expected void const [noderef] 
__user *from @@ got struct guid_t [usertype] *[addressable] vendor_guid @@
   drivers/firmware/efi/test/efi_test.c:160:61: sparse: expected void const 
[noderef] __user *from
   drivers/firmware/efi/test/efi_test.c:160:61: sparse: got struct guid_t 
[usertype] *[addressable] vendor_guid
   drivers/firmware/efi/test/efi_test.c:167:60: sparse: sparse: incorrect type 
in argument 2 (different address spaces) @@ expected unsigned short 
[noderef] [usertype] __user *src @@ got unsigned short [usertype] 
*[addressable] variable_name @@
   drivers/firmware/efi/test/efi_test.c:167:60: sparse: expected unsigned 
short [noderef] [usertype] __user *src
   drivers/firmware/efi/test/efi_test.c:167:60: sparse: got unsigned short 
[usertype] *[addressable] variable_name
>> drivers/firmware/efi/test/efi_test.c:187:13: sparse: sparse: incorrect type 
>> in initializer (different address spaces) @@ expected unsigned long 
>> [noderef] __

[PATCH v2 2/2] arm/boot: Warn on orphan section placement

2020-06-22 Thread Kees Cook

We don't want to depend on the linker's orphan section placement
heuristics as these can vary between linkers, and may change between
versions. All sections need to be explicitly named in the linker
script.

Use common macros for debug sections, discards, and text stubs. Add
discards for unwanted .note, and .rel sections. Finally, enable orphan
section warning.

Signed-off-by: Kees Cook 
---
 arch/arm/boot/compressed/Makefile  |  2 ++
 arch/arm/boot/compressed/vmlinux.lds.S | 17 +++--
 2 files changed, 9 insertions(+), 10 deletions(-)

diff --git a/arch/arm/boot/compressed/Makefile 
b/arch/arm/boot/compressed/Makefile
index 00602a6fba04..b8a97d81662d 100644
--- a/arch/arm/boot/compressed/Makefile
+++ b/arch/arm/boot/compressed/Makefile
@@ -128,6 +128,8 @@ endif
 LDFLAGS_vmlinux += --no-undefined
 # Delete all temporary local symbols
 LDFLAGS_vmlinux += -X
+# Report orphan sections
+LDFLAGS_vmlinux += --orphan-handling=warn
 # Next argument is a linker script
 LDFLAGS_vmlinux += -T
 
diff --git a/arch/arm/boot/compressed/vmlinux.lds.S 
b/arch/arm/boot/compressed/vmlinux.lds.S
index 09ac33f52814..c2a8509f876f 100644
--- a/arch/arm/boot/compressed/vmlinux.lds.S
+++ b/arch/arm/boot/compressed/vmlinux.lds.S
@@ -2,6 +2,7 @@
 /*
  *  Copyright (C) 2000 Russell King
  */
+#include 
 
 #ifdef CONFIG_CPU_ENDIAN_BE8
 #define ZIMAGE_MAGIC(x) ( (((x) >> 24) & 0x00ff) | \
@@ -17,8 +18,11 @@ ENTRY(_start)
 SECTIONS
 {
   /DISCARD/ : {
+ARM_COMMON_DISCARD
 *(.ARM.exidx*)
 *(.ARM.extab*)
+*(.note.*)
+*(.rel.*)
 /*
  * Discard any r/w data - this produces a link error if we have any,
  * which is required for PIC decompression.  Local data generates
@@ -36,9 +40,7 @@ SECTIONS
 *(.start)
 *(.text)
 *(.text.*)
-*(.gnu.warning)
-*(.glue_7t)
-*(.glue_7)
+ARM_STUBS_TEXT
   }
   .table : ALIGN(4) {
 _table_start = .;
@@ -128,12 +130,7 @@ SECTIONS
   PROVIDE(__pecoff_data_size = ALIGN(512) - ADDR(.data));
   PROVIDE(__pecoff_end = ALIGN(512));
 
-  .stab 0  : { *(.stab) }
-  .stabstr 0   : { *(.stabstr) }
-  .stab.excl 0 : { *(.stab.excl) }
-  .stab.exclstr 0  : { *(.stab.exclstr) }
-  .stab.index 0: { *(.stab.index) }
-  .stab.indexstr 0 : { *(.stab.indexstr) }
-  .comment 0   : { *(.comment) }
+  STABS_DEBUG
+  DWARF_DEBUG
 }
 ASSERT(_edata_real == _edata, "error: zImage file size is incorrect");
-- 
2.25.1

[PATCH v2 1/2] arm/build: Warn on orphan section placement

2020-06-22 Thread Kees Cook

We don't want to depend on the linker's orphan section placement
heuristics as these can vary between linkers, and may change between
versions. All sections need to be explicitly named in the linker
script.

Specifically, this would have made a recently fixed bug very obvious:

ld: warning: orphan section `.fixup' from `arch/arm/lib/copy_from_user.o' being 
placed in section `.fixup'

Refactor linker script include file for use in standard and XIP linker
scripts, as well as in the coming boot linker script changes. Add debug
sections explicitly. Create ARM_COMMON_DISCARD macro with unneeded
sections .ARM.attributes, .iplt, .rel.iplt, .igot.plt, and .modinfo.
Create ARM_STUBS_TEXT macro with missed text stub sections .vfp11_veneer,
and .v4_bx. Finally enable orphan section warning.

Signed-off-by: Kees Cook 
---
 arch/arm/Makefile |  4 
 .../arm/{kernel => include/asm}/vmlinux.lds.h | 22 ++-
 arch/arm/kernel/vmlinux-xip.lds.S |  5 ++---
 arch/arm/kernel/vmlinux.lds.S |  5 ++---
 4 files changed, 25 insertions(+), 11 deletions(-)
 rename arch/arm/{kernel => include/asm}/vmlinux.lds.h (92%)

diff --git a/arch/arm/Makefile b/arch/arm/Makefile
index 59fde2d598d8..e414e3732b3a 100644
--- a/arch/arm/Makefile
+++ b/arch/arm/Makefile
@@ -16,6 +16,10 @@ LDFLAGS_vmlinux  += --be8
 KBUILD_LDFLAGS_MODULE  += --be8
 endif
 
+# We never want expected sections to be placed heuristically by the
+# linker. All sections should be explicitly named in the linker script.
+LDFLAGS_vmlinux += --orphan-handling=warn
+
 ifeq ($(CONFIG_ARM_MODULE_PLTS),y)
 KBUILD_LDS_MODULE  += $(srctree)/arch/arm/kernel/module.lds
 endif
diff --git a/arch/arm/kernel/vmlinux.lds.h b/arch/arm/include/asm/vmlinux.lds.h
similarity index 92%
rename from arch/arm/kernel/vmlinux.lds.h
rename to arch/arm/include/asm/vmlinux.lds.h
index 381a8e105fa5..3d88ea74f4cd 100644
--- a/arch/arm/kernel/vmlinux.lds.h
+++ b/arch/arm/include/asm/vmlinux.lds.h
@@ -1,4 +1,5 @@
 /* SPDX-License-Identifier: GPL-2.0 */
+#include 
 
 #ifdef CONFIG_HOTPLUG_CPU
 #define ARM_CPU_DISCARD(x)
@@ -37,6 +38,13 @@
*(.idmap.text)  \
__idmap_text_end = .;   \
 
+#define ARM_COMMON_DISCARD \
+   *(.ARM.attributes)  \
+   *(.iplt) *(.rel.iplt) *(.igot.plt)  \
+   *(.modinfo) \
+   *(.discard) \
+   *(.discard.*)
+
 #define ARM_DISCARD\
*(.ARM.exidx.exit.text) \
*(.ARM.extab.exit.text) \
@@ -49,8 +57,14 @@
EXIT_CALL   \
ARM_MMU_DISCARD(*(.text.fixup)) \
ARM_MMU_DISCARD(*(__ex_table))  \
-   *(.discard) \
-   *(.discard.*)
+   ARM_COMMON_DISCARD
+
+#define ARM_STUBS_TEXT \
+   *(.gnu.warning) \
+   *(.glue_7t) \
+   *(.glue_7)  \
+   *(.vfp11_veneer)\
+   *(.v4_bx)
 
 #define ARM_TEXT   \
IDMAP_TEXT  \
@@ -64,9 +78,7 @@
CPUIDLE_TEXT\
LOCK_TEXT   \
KPROBES_TEXT\
-   *(.gnu.warning) \
-   *(.glue_7)  \
-   *(.glue_7t) \
+   ARM_STUBS_TEXT  \
. = ALIGN(4);   \
*(.got) /* Global offset table */   \
ARM_CPU_KEEP(PROC_INFO)
diff --git a/arch/arm/kernel/vmlinux-xip.lds.S 
b/arch/arm/kernel/vmlinux-xip.lds.S
index 6d2be994ae58..0807f40844a2 100644
--- a/arch/arm/kernel/vmlinux-xip.lds.S
+++ b/arch/arm/kernel/vmlinux-xip.lds.S
@@ -9,15 +9,13 @@
 
 #include 
 
-#include 
+#include 
 #include 
 #include 
 #include 
 #include 
 #include 
 
-#include "vmlinux.lds.h"
-
 OUTPUT_ARCH(arm)
 ENTRY(stext)
 
@@ -152,6 +150,7 @@ SECTIONS

[PATCH] checkpatch: fix CONST_STRUCT when const_structs.checkpatch is missing

2020-06-22 Thread Quentin Monnet

Checkpatch reports warnings when some specific structs are not declared
as const in the code. The list of structs to consider was initially
defined in the checkpatch.pl script itself, but it was later moved to an
external file (scripts/const_structs.checkpatch). This introduced two
minor issues:

- When file scripts/const_structs.checkpatch is not present (for
  example, if checkpatch is run outside of the kernel directory with the
  "--no-tree" option), a warning is printed to stderr to tell the user
  that "No structs that should be const will be found". This is fair,
  but the warning is printed unconditionally, even if the option
  "--ignore CONST_STRUCT" is passed. In the latter case, we explicitly
  ask checkpatch to skip this check, so no warning should be printed.

- When scripts/const_structs.checkpatch is missing, or even when trying
  to silence the warning by adding an empty file, $const_structs is set
  to "", and the regex used for finding structs that should be const,
  "$line =~ /\bstruct\s+($const_structs)\b(?!\s*\{)/)", matches all
  structs found in the code, thus reporting a number of false positives.

Let's fix the first item by skipping scripts/const_structs.checkpatch
processing if "CONST_STRUCT" checks are ignored, and the second one by
skipping the test if $const_structs is an empty string.

Fixes: bf1fa1dae68e ("checkpatch: externalize the structs that should be const")
Signed-off-by: Quentin Monnet 
---
 scripts/checkpatch.pl | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index b06093777fd8..dcbf4ff5d445 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -781,8 +781,10 @@ sub read_words {
 }
 
 my $const_structs = "";
-read_words(\$const_structs, $conststructsfile)
-or warn "No structs that should be const will be found - file 
'$conststructsfile': $!\n";
+if (show_type("CONST_STRUCT")) {
+   read_words(\$const_structs, $conststructsfile)
+   or warn "No structs that should be const will be found - file 
'$conststructsfile': $!\n";
+}
 
 my $typeOtherTypedefs = "";
 if (length($typedefsfile)) {
@@ -6660,7 +6662,8 @@ sub process {
 
 # check for various structs that are normally const (ops, kgdb, device_tree)
 # and avoid what seem like struct definitions 'struct foo {'
-   if ($line !~ /\bconst\b/ &&
+   if ($const_structs ne "" &&
+   $line !~ /\bconst\b/ &&
$line =~ /\bstruct\s+($const_structs)\b(?!\s*\{)/) {
WARN("CONST_STRUCT",
 "struct $1 should normally be const\n" . 
$herecurr);
-- 
2.20.1

[PATCH v2 0/2] arm: Warn on orphan section placement

2020-06-22 Thread Kees Cook

v2:
- split by architecture, rebase to v5.8-rc2
v1: https://lore.kernel.org/lkml/20200228002244.15240-1-keesc...@chromium.org/

A recent bug[1] was solved for builds linked with ld.lld, and tracking
it down took way longer than it needed to (a year). Ultimately, it
boiled down to differences between ld.bfd and ld.lld's handling of
orphan sections. Similarly, the recent FGKASLR series brough up orphan
section handling too[2]. In both cases, it would have been nice if the
linker was running with --orphan-handling=warn so that surprise sections
wouldn't silently get mapped into the kernel image at locations up to the
whim of the linker's orphan handling logic. Instead, all desired sections
should be explicitly identified in the linker script (to be either kept or
discarded) with any orphans throwing a warning. The powerpc architecture
actually already does this, so this series extends coverage to arm.

This series needs one additional commit that is not yet in
any tree, but I hope to have it landed via x86 -tip shortly:
https://lore.kernel.org/lkml/20200228002244.15240-3-keesc...@chromium.org/

Thanks!

-Kees

[1] https://github.com/ClangBuiltLinux/linux/issues/282
[2] https://lore.kernel.org/lkml/202002242122.AA4D1B8@keescook/

Kees Cook (2):
  arm/build: Warn on orphan section placement
  arm/boot: Warn on orphan section placement

 arch/arm/Makefile |  4 
 arch/arm/boot/compressed/Makefile |  2 ++
 arch/arm/boot/compressed/vmlinux.lds.S| 17 ++
 .../arm/{kernel => include/asm}/vmlinux.lds.h | 22 ++-
 arch/arm/kernel/vmlinux-xip.lds.S |  5 ++---
 arch/arm/kernel/vmlinux.lds.S |  5 ++---
 6 files changed, 34 insertions(+), 21 deletions(-)
 rename arch/arm/{kernel => include/asm}/vmlinux.lds.h (92%)

-- 
2.25.1

Re: [PATCH] initrd: Remove erroneous comment

2020-06-22 Thread H. Peter Anvin

On 2020-06-22 13:40, Tom Rini wrote:
> On Mon, Jun 22, 2020 at 01:02:16PM -0700, ron minnich wrote:
> 
>> The other thing you ought to consider fixing:
>> initrd is documented as follows:
>>
>> initrd= [BOOT] Specify the location of the initial ramdisk
>>
>> for bootloaders only.
>>
>> UEFI consumes initrd from the command line as well. As ARM servers
>> increasingly use UEFI, there may be situations in which the initrd
>> option doesn't make its way to the kernel? I don't know, UEFI is such
>> a black box to me. But I've seen this "initrd consumption" happen.
>>
>> Based on docs, and the growing use of bootloaders that are happy to
>> consume initrd= and not pass it to the kernel, you might be better off
>> trying to move to the new command line option anyway.
>>
>> IOW, this comment may not be what people want to see, but ... it might
>> also be right. Or possibly changed to:
>>
>> /*
>>  * The initrd keyword is in use today on ARM, PowerPC, and MIPS.
>>  * It is also reserved for use by bootloaders such as UEFI and may
>>  * be consumed by them and not passed on to the kernel.
>>  * The documentation also shows it as reserved for bootloaders.
>>  * It is advised to move to the initrdmem= option whereever possible.
>>  */
> 
> Fair warning, one of the other hats I wear is the chief custodian of the
> U-Boot project.
> 
> Note that on most architectures in modern times the device tree is used
> to pass in initrd type information and "initrd=" on the command line is
> quite legacy.
> 
> But what do you mean UEFI "consumes" initrd= ?  It's quite expected that
> when you configure grub/syslinux/systemd-boot/whatever via extlinux.conf
> or similar with "initrd /some/file" something reasonable happens to
> read that in to memory and pass along the location to Linux (which can
> vary from arch to arch, when not using device tree).  I guess looking at 
> Documentation/x86/boot.rst is where treating initrd= as a file that
> should be handled and ramdisk_image / ramdisk_size set came from.  I do
> wonder what happens in the case of ARM/ARM64 + UEFI without device tree.
> 

UEFI plus the in-kernel UEFI stub is, in some ways, a "bootloader" in
the traditional sense. It is totally fair that we should update the
documentation with this as a different case, though, because it is part
of the kernel tree and so the kernel now has partial ownership of the
namespace.

I suggest "STUB" for "in-kernel firmware stub" for this purpose; no need
to restrict it to a specific firmware for the purpose of namespace
reservation.

-hpa

[PATCH] drm/gma500: Fix direction check in psb_accel_2d_copy()

2020-06-22 Thread Denis Efremov

psb_accel_2d_copy() checks direction PSB_2D_COPYORDER_BR2TL twice.
Based on psb_accel_2d_copy_direction() results, PSB_2D_COPYORDER_TL2BR
should be checked instead in the second direction check.

Fixes: 4d8d096e9ae8 ("gma500: introduce the framebuffer support code")
Cc: sta...@vger.kernel.org
Signed-off-by: Denis Efremov 
---
 drivers/gpu/drm/gma500/accel_2d.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/gma500/accel_2d.c 
b/drivers/gpu/drm/gma500/accel_2d.c
index adc0507545bf..8dc86aac54d2 100644
--- a/drivers/gpu/drm/gma500/accel_2d.c
+++ b/drivers/gpu/drm/gma500/accel_2d.c
@@ -179,8 +179,8 @@ static int psb_accel_2d_copy(struct drm_psb_private 
*dev_priv,
src_x += size_x - 1;
dst_x += size_x - 1;
}
-   if (direction == PSB_2D_COPYORDER_BR2TL ||
-   direction == PSB_2D_COPYORDER_BL2TR) {
+   if (direction == PSB_2D_COPYORDER_BL2TR ||
+   direction == PSB_2D_COPYORDER_TL2BR) {
src_y += size_y - 1;
dst_y += size_y - 1;
}
-- 
2.26.2

Re: [PATCH resend] net: cxgb4: fix return error value in t4_prep_fw

2020-06-22 Thread Jakub Kicinski

On Sat, 20 Jun 2020 10:49:21 +0800 Li Heng wrote:
> t4_prep_fw goto bye tag with positive return value when something
> bad happened and which can not free resource in adap_init0.
> so fix it to return negative value.
> 
> Fixes: 16e47624e76b ("cxgb4: Add new scheme to update T4/T5 firmware")
> Reported-by: Hulk Robot 
> Signed-off-by: Li Heng 
> Signed-off-by: Jakub Kicinski 

I don't remember signing off on this..

[PATCH v3 0/4] mfd: lp87565: convert DT to yaml, ignore ENx pins and add LP87524-Q1

2020-06-22 Thread Luca Ceresoli

Hi,

the first patch in this series is a small but significant variation in how
the lp87565 driver enables the output rails, to allow the kernel to always
know when it is enabling an output. However it can change existing
behaviour (depending on the hardware setup) and thus it should be carefully
evaluated.

The following patches are a fairly straightforward addition of a new chip
variant along DT bindings conversion to yaml.

v3 fixes the yaml errors present in v2.

RFC,v1: https://lkml.org/lkml/2020/6/3/908
v2: https://lkml.org/lkml/2020/6/17/492

Luca

Luca Ceresoli (4):
  regulator: lp87565: enable voltage regardless of ENx pin
  dt-bindings: mfd: lp87565: convert to yaml
  dt-bindings: mfd: lp87565: add LP87524-Q1 variant
  mfd: lp87565: add LP87524-Q1 variant

 .../devicetree/bindings/mfd/lp87565.txt   |  79 --
 .../devicetree/bindings/mfd/ti,lp875xx.yaml   | 225 ++
 drivers/mfd/lp87565.c |   4 +
 drivers/regulator/lp87565-regulator.c |  21 +-
 include/linux/mfd/lp87565.h   |   1 +
 5 files changed, 249 insertions(+), 81 deletions(-)
 delete mode 100644 Documentation/devicetree/bindings/mfd/lp87565.txt
 create mode 100644 Documentation/devicetree/bindings/mfd/ti,lp875xx.yaml

-- 
2.27.0

Re: [PATCH v2 2/4] dt-bindings: mfd: lp87565: convert to yaml

2020-06-22 Thread Luca Ceresoli

Hi Rob,

On 17/06/20 19:11, Rob Herring wrote:
> On Wed, 17 Jun 2020 15:11:43 +0200, Luca Ceresoli wrote:
>> The definition of "xxx-in-supply" was generic, thus define in detail the
>> possible cases for each chip variant.
>>
>> Also document that the only possible I2C slave address is 0x60 as per the
>> datasheet and fix the second example accordingly.
>>
>> Signed-off-by: Luca Ceresoli 
>>
>> ---
>>
>> Changes in v2:
>>  - this patch replaces patch "regulator: lp87565: dt: remove duplicated
>>section" in RFC,v1 (Rob Herring)
>>  - use capital letters consistently (Lee Jones)
>>  - replace "regulator" -> "mfd" in subject line (Lee Jones)
>>  - replace "dt:" suffix with "dt-bindings:" prefix in subject line
>> ---
>>  .../devicetree/bindings/mfd/lp87565.txt   |  79 ---
>>  .../devicetree/bindings/mfd/ti,lp875xx.yaml   | 134 ++
>>  2 files changed, 134 insertions(+), 79 deletions(-)
>>  delete mode 100644 Documentation/devicetree/bindings/mfd/lp87565.txt
>>  create mode 100644 Documentation/devicetree/bindings/mfd/ti,lp875xx.yaml
>>
> 
> 
> My bot found errors running 'make dt_binding_check' on your patch:
> 
> Error: Documentation/devicetree/bindings/mfd/ti,lp875xx.example.dts:20.9-14 
> syntax error
> FATAL ERROR: Unable to parse input tree
> scripts/Makefile.lib:315: recipe for target 
> 'Documentation/devicetree/bindings/mfd/ti,lp875xx.example.dt.yaml' failed
> make[1]: *** 
> [Documentation/devicetree/bindings/mfd/ti,lp875xx.example.dt.yaml] Error 1
> make[1]: *** Waiting for unfinished jobs
> Makefile:1347: recipe for target 'dt_binding_check' failed
> make: *** [dt_binding_check] Error 2

Apologies, v3 incoming with these fixed (and dt_binding_check run in its
entirety this time).

Thanks,
-- 
Luca

[PATCH v3 3/4] dt-bindings: mfd: lp87565: add LP87524-Q1 variant

2020-06-22 Thread Luca Ceresoli

Add the LP87524-Q1 to the LP87565 bindings document along with an example.

Signed-off-by: Luca Ceresoli 

---

Changes in v3:
 - fix yaml errors

Changes in v2:
 - RFC,v1 was based on the txt file, rewrite for yaml
 - use uppercase consistently in model names (Lee Jones)
 - replace "regulator" -> "mfd" in subject line (Lee Jones)
 - replace "dt:" suffix with "dt-bindings:" prefix in subject line
---
 .../devicetree/bindings/mfd/ti,lp875xx.yaml   | 83 +++
 1 file changed, 83 insertions(+)

diff --git a/Documentation/devicetree/bindings/mfd/ti,lp875xx.yaml 
b/Documentation/devicetree/bindings/mfd/ti,lp875xx.yaml
index 2da703918d6a..e6fdf61e89a8 100644
--- a/Documentation/devicetree/bindings/mfd/ti,lp875xx.yaml
+++ b/Documentation/devicetree/bindings/mfd/ti,lp875xx.yaml
@@ -15,6 +15,7 @@ properties:
   - const: ti,lp87565
   - const: ti,lp87565-q1
   - const: ti,lp87561-q1
+  - const: ti,lp87524-q1
 
   reg:
 description: I2C slave address
@@ -72,6 +73,36 @@ allOf:
   required:
 - buck3210-in-supply
 
+  - if:
+  properties:
+compatible:
+  contains:
+enum:
+  - ti,lp87524-q1
+then:
+  properties:
+buck0-in-supply:
+  description:
+Phandle to parent supply node for the BUCK0 converter.
+
+buck1-in-supply:
+  description:
+Phandle to parent supply node for the BUCK1 converter.
+
+buck2-in-supply:
+  description:
+Phandle to parent supply node for the BUCK2 converter.
+
+buck3-in-supply:
+  description:
+Phandle to parent supply node for the BUCK3 converter.
+
+  required:
+- buck0-in-supply
+- buck1-in-supply
+- buck2-in-supply
+- buck3-in-supply
+
 examples:
   - |
 /* TI LP87565-Q1 PMIC (dual 2-phase output configuration) */
@@ -139,4 +170,56 @@ examples:
 };
 };
 
+  - |
+/* TI LP87524-Q1 PMIC (four 1-phase output configuration) */
+i2c@0 {
+reg = <0x0 0x100>;
+#address-cells = <1>;
+#size-cells = <0>;
+
+pmic@60 {
+compatible = "ti,lp87524-q1";
+reg = <0x60>;
+gpio-controller;
+#gpio-cells = <2>;
+
+buck0-in-supply = <_5v0>;
+buck1-in-supply = <_5v0>;
+buck2-in-supply = <_5v0>;
+buck3-in-supply = <_5v0>;
+
+regulators {
+buck0_reg: buck0 {
+regulator-name = "buck0";
+regulator-min-microvolt = <330>;
+regulator-max-microvolt = <330>;
+regulator-always-on;
+};
+
+buck1_reg: buck1 {
+regulator-name = "buck1";
+regulator-min-microvolt = <135>;
+regulator-max-microvolt = <135>;
+regulator-always-on;
+};
+
+buck2_reg: buck2 {
+regulator-name = "buck2";
+regulator-min-microvolt = <95>;
+regulator-max-microvolt = <95>;
+regulator-always-on;
+};
+
+buck3_reg: buck3 {
+regulator-name = "buck3";
+regulator-min-microvolt = <180>;
+regulator-max-microvolt = <180>;
+regulator-always-on;
+};
+};
+};
+};
+
+
+
 ...
-- 
2.27.0

[PATCH v3 2/4] dt-bindings: mfd: lp87565: convert to yaml

2020-06-22 Thread Luca Ceresoli

The definition of "xxx-in-supply" was generic, thus define in detail the
possible cases for each chip variant.

Also document that the only possible I2C slave address is 0x60 as per the
datasheet and fix the second example accordingly.

Signed-off-by: Luca Ceresoli 

---

Changes in v3:
 - fix yaml errors

Changes in v2:
 - this patch replaces patch "regulator: lp87565: dt: remove duplicated
   section" in RFC,v1 (Rob Herring)
 - use capital letters consistently (Lee Jones)
 - replace "regulator" -> "mfd" in subject line (Lee Jones)
 - replace "dt:" suffix with "dt-bindings:" prefix in subject line
---
 .../devicetree/bindings/mfd/lp87565.txt   |  79 --
 .../devicetree/bindings/mfd/ti,lp875xx.yaml   | 142 ++
 2 files changed, 142 insertions(+), 79 deletions(-)
 delete mode 100644 Documentation/devicetree/bindings/mfd/lp87565.txt
 create mode 100644 Documentation/devicetree/bindings/mfd/ti,lp875xx.yaml

diff --git a/Documentation/devicetree/bindings/mfd/lp87565.txt 
b/Documentation/devicetree/bindings/mfd/lp87565.txt
deleted file mode 100644
index 41671e0dc26b..
--- a/Documentation/devicetree/bindings/mfd/lp87565.txt
+++ /dev/null
@@ -1,79 +0,0 @@
-TI LP87565 PMIC MFD driver
-
-Required properties:
-  - compatible:"ti,lp87565", "ti,lp87565-q1"
-  - reg:   I2C slave address.
-  - gpio-controller:   Marks the device node as a GPIO Controller.
-  - #gpio-cells:   Should be two.  The first cell is the pin number and
-   the second cell is used to specify flags.
-   See ../gpio/gpio.txt for more information.
-  - xxx-in-supply: Phandle to parent supply node of each regulator
-   populated under regulators node. xxx should match
-   the supply_name populated in driver.
-Example:
-
-lp87565_pmic: pmic@60 {
-   compatible = "ti,lp87565-q1";
-   reg = <0x60>;
-   gpio-controller;
-   #gpio-cells = <2>;
-
-   buck10-in-supply = <_3v3>;
-   buck23-in-supply = <_3v3>;
-
-   regulators: regulators {
-   buck10_reg: buck10 {
-   /* VDD_MPU */
-   regulator-name = "buck10";
-   regulator-min-microvolt = <85>;
-   regulator-max-microvolt = <125>;
-   regulator-always-on;
-   regulator-boot-on;
-   };
-
-   buck23_reg: buck23 {
-   /* VDD_GPU */
-   regulator-name = "buck23";
-   regulator-min-microvolt = <85>;
-   regulator-max-microvolt = <125>;
-   regulator-boot-on;
-   regulator-always-on;
-   };
-   };
-};
-
-TI LP87561 PMIC:
-
-This is a single output 4-phase regulator configuration
-
-Required properties:
-  - compatible:"ti,lp87561-q1"
-  - reg:   I2C slave address.
-  - gpio-controller:   Marks the device node as a GPIO Controller.
-  - #gpio-cells:   Should be two.  The first cell is the pin number and
-   the second cell is used to specify flags.
-   See ../gpio/gpio.txt for more information.
-  - xxx-in-supply: Phandle to parent supply node of each regulator
-   populated under regulators node. xxx should match
-   the supply_name populated in driver.
-Example:
-
-lp87561_pmic: pmic@62 {
-   compatible = "ti,lp87561-q1";
-   reg = <0x62>;
-   gpio-controller;
-   #gpio-cells = <2>;
-
-   buck3210-in-supply = <_3v3>;
-
-   regulators: regulators {
-   buck3210_reg: buck3210 {
-   /* VDD_CORE */
-   regulator-name = "buck3210";
-   regulator-min-microvolt = <80>;
-   regulator-max-microvolt = <80>;
-   regulator-always-on;
-   regulator-boot-on;
-   };
-   };
-};
diff --git a/Documentation/devicetree/bindings/mfd/ti,lp875xx.yaml 
b/Documentation/devicetree/bindings/mfd/ti,lp875xx.yaml
new file mode 100644
index ..2da703918d6a
--- /dev/null
+++ b/Documentation/devicetree/bindings/mfd/ti,lp875xx.yaml
@@ -0,0 +1,142 @@
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/mfd/ti,lp875xx.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: TI LP875xx PMIC MFD driver
+
+maintainers:
+  - Keerthy 
+
+properties:
+  compatible:
+oneOf:
+  - const: ti,lp87565
+  - const: ti,lp87565-q1
+  - const: ti,lp87561-q1
+
+  reg:
+description: I2C slave address
+const: 0x60
+
+  gpio-controller: true
+
+  '#gpio-cells':
+description:
+  The first cell is the pin number.
+  The second cell is is used to specify flags.
+  See ../gpio/gpio.txt for

[PATCH v3 1/4] regulator: lp87565: enable voltage regardless of ENx pin

2020-06-22 Thread Luca Ceresoli

This driver enables outputs by setting bit EN_BUCKn in the BUCKn_CTRL1
register. However, if bit EN_PIN_CTRLn in the same register is set, the
output is actually enabled only if EN_BUCKn is set AND an enable pin is
active. Since the driver does not touch EN_PIN_CTRLn, the choice is left to
the hardware, which in turn gets this bit from OTP memory, and in absence
of OTP data it uses a default value that is documented in the datasheet for
LP8752x, but not for LP8756x.

Thus the driver doesn't really "know" whether it is actually enabling the
output or not.

In order to make sure activation is always driver-controlled, just clear
the EN_PIN_CTRLn bit. Now all activation solely depend on the EN_BUCKn bit.

Signed-off-by: Luca Ceresoli 

---

As discussed in RFC,v1 [0] there is a potential regression on existing
hardware, see the discussion for more details. So far Mark Brown kind of
approved the idea behind this patch, but more discussion about the correct
way to handle this situation would be greatly appreciated.

[0] https://lkml.org/lkml/2020/6/3/907

Changes in v3: none
Changes in v2: none
---
 drivers/regulator/lp87565-regulator.c | 21 +++--
 1 file changed, 19 insertions(+), 2 deletions(-)

diff --git a/drivers/regulator/lp87565-regulator.c 
b/drivers/regulator/lp87565-regulator.c
index 5d525dacf959..fbed6bc80c1a 100644
--- a/drivers/regulator/lp87565-regulator.c
+++ b/drivers/regulator/lp87565-regulator.c
@@ -11,8 +11,8 @@
 
 #include 
 
-#define LP87565_REGULATOR(_name, _id, _of, _ops, _n, _vr, _vm, _er, _em, \
-_delay, _lr, _cr)  \
+#define LP87565_REGULATOR(_name, _id, _of, _ops, _n, _vr, _vm, \
+ _er, _em, _ev, _delay, _lr, _cr)  \
[_id] = {   \
.desc = {   \
.name   = _name,\
@@ -28,6 +28,7 @@
.vsel_mask  = _vm,  \
.enable_reg = _er,  \
.enable_mask= _em,  \
+   .enable_val = _ev,  \
.ramp_delay = _delay,   \
.linear_ranges  = _lr,  \
.n_linear_ranges= ARRAY_SIZE(_lr),  \
@@ -121,38 +122,54 @@ static const struct lp87565_regulator regulators[] = {
LP87565_REGULATOR("BUCK0", LP87565_BUCK_0, "buck0", lp87565_buck_ops,
  256, LP87565_REG_BUCK0_VOUT, LP87565_BUCK_VSET,
  LP87565_REG_BUCK0_CTRL_1,
+ LP87565_BUCK_CTRL_1_EN |
+ LP87565_BUCK_CTRL_1_EN_PIN_CTRL,
  LP87565_BUCK_CTRL_1_EN, 3230,
  buck0_1_2_3_ranges, LP87565_REG_BUCK0_CTRL_2),
LP87565_REGULATOR("BUCK1", LP87565_BUCK_1, "buck1", lp87565_buck_ops,
  256, LP87565_REG_BUCK1_VOUT, LP87565_BUCK_VSET,
  LP87565_REG_BUCK1_CTRL_1,
+ LP87565_BUCK_CTRL_1_EN |
+ LP87565_BUCK_CTRL_1_EN_PIN_CTRL,
  LP87565_BUCK_CTRL_1_EN, 3230,
  buck0_1_2_3_ranges, LP87565_REG_BUCK1_CTRL_2),
LP87565_REGULATOR("BUCK2", LP87565_BUCK_2, "buck2", lp87565_buck_ops,
  256, LP87565_REG_BUCK2_VOUT, LP87565_BUCK_VSET,
  LP87565_REG_BUCK2_CTRL_1,
+ LP87565_BUCK_CTRL_1_EN |
+ LP87565_BUCK_CTRL_1_EN_PIN_CTRL,
  LP87565_BUCK_CTRL_1_EN, 3230,
  buck0_1_2_3_ranges, LP87565_REG_BUCK2_CTRL_2),
LP87565_REGULATOR("BUCK3", LP87565_BUCK_3, "buck3", lp87565_buck_ops,
  256, LP87565_REG_BUCK3_VOUT, LP87565_BUCK_VSET,
  LP87565_REG_BUCK3_CTRL_1,
+ LP87565_BUCK_CTRL_1_EN |
+ LP87565_BUCK_CTRL_1_EN_PIN_CTRL,
  LP87565_BUCK_CTRL_1_EN, 3230,
  buck0_1_2_3_ranges, LP87565_REG_BUCK3_CTRL_2),
LP87565_REGULATOR("BUCK10", LP87565_BUCK_10, "buck10", lp87565_buck_ops,
  256, LP87565_REG_BUCK0_VOUT, LP87565_BUCK_VSET,
  LP87565_REG_BUCK0_CTRL_1,
  LP87565_BUCK_CTRL_1_EN |
+ LP87565_BUCK_CTRL_1_EN_PIN_CTRL |
+ LP87565_BUCK_CTRL_1_FPWM_MP_0_2,
+ LP87565_BUCK_CTRL_1_EN |
  LP87565_BUCK_CTRL_1_FPWM_MP_0_2, 3230,
  buck0_1_2_3_ranges, LP87565_REG_BUCK0_CTRL_2),
LP87565_REGULATOR("BUCK23",

[PATCH v3 4/4] mfd: lp87565: add LP87524-Q1 variant

2020-06-22 Thread Luca Ceresoli

Add support for the LP87524B/J/P-Q1 Four 4-MHz Buck Converter. This is a
variant of the LP87565 having 4 single-phase outputs and up to 10 A of
total output current.

Signed-off-by: Luca Ceresoli 
Acked-for-MFD-by: Lee Jones 

---

Changes in v3: none

Changes in v2:
 - replace "regulator" -> "mfd" in subject line (Lee Jones)
 - add Acked-for-MFD-by: from Lee Jones
---
 drivers/mfd/lp87565.c   | 4 
 include/linux/mfd/lp87565.h | 1 +
 2 files changed, 5 insertions(+)

diff --git a/drivers/mfd/lp87565.c b/drivers/mfd/lp87565.c
index 4a5c8ade4ae0..cc1072927f6d 100644
--- a/drivers/mfd/lp87565.c
+++ b/drivers/mfd/lp87565.c
@@ -26,6 +26,10 @@ static const struct mfd_cell lp87565_cells[] = {
 
 static const struct of_device_id of_lp87565_match_table[] = {
{ .compatible = "ti,lp87565", },
+   {
+   .compatible = "ti,lp87524-q1",
+   .data = (void *)LP87565_DEVICE_TYPE_LP87524_Q1,
+   },
{
.compatible = "ti,lp87565-q1",
.data = (void *)LP87565_DEVICE_TYPE_LP87565_Q1,
diff --git a/include/linux/mfd/lp87565.h b/include/linux/mfd/lp87565.h
index ce965354bbad..ad240f2d0d3f 100644
--- a/include/linux/mfd/lp87565.h
+++ b/include/linux/mfd/lp87565.h
@@ -14,6 +14,7 @@
 
 enum lp87565_device_type {
LP87565_DEVICE_TYPE_UNKNOWN = 0,
+   LP87565_DEVICE_TYPE_LP87524_Q1,
LP87565_DEVICE_TYPE_LP87561_Q1,
LP87565_DEVICE_TYPE_LP87565_Q1,
 };
-- 
2.27.0

Re: [PATCH] initrd: Remove erroneous comment

2020-06-22 Thread Tom Rini

On Mon, Jun 22, 2020 at 01:02:16PM -0700, ron minnich wrote:

> The other thing you ought to consider fixing:
> initrd is documented as follows:
> 
> initrd= [BOOT] Specify the location of the initial ramdisk
> 
> for bootloaders only.
> 
> UEFI consumes initrd from the command line as well. As ARM servers
> increasingly use UEFI, there may be situations in which the initrd
> option doesn't make its way to the kernel? I don't know, UEFI is such
> a black box to me. But I've seen this "initrd consumption" happen.
> 
> Based on docs, and the growing use of bootloaders that are happy to
> consume initrd= and not pass it to the kernel, you might be better off
> trying to move to the new command line option anyway.
> 
> IOW, this comment may not be what people want to see, but ... it might
> also be right. Or possibly changed to:
> 
> /*
>  * The initrd keyword is in use today on ARM, PowerPC, and MIPS.
>  * It is also reserved for use by bootloaders such as UEFI and may
>  * be consumed by them and not passed on to the kernel.
>  * The documentation also shows it as reserved for bootloaders.
>  * It is advised to move to the initrdmem= option whereever possible.
>  */

Fair warning, one of the other hats I wear is the chief custodian of the
U-Boot project.

Note that on most architectures in modern times the device tree is used
to pass in initrd type information and "initrd=" on the command line is
quite legacy.

But what do you mean UEFI "consumes" initrd= ?  It's quite expected that
when you configure grub/syslinux/systemd-boot/whatever via extlinux.conf
or similar with "initrd /some/file" something reasonable happens to
read that in to memory and pass along the location to Linux (which can
vary from arch to arch, when not using device tree).  I guess looking at 
Documentation/x86/boot.rst is where treating initrd= as a file that
should be handled and ramdisk_image / ramdisk_size set came from.  I do
wonder what happens in the case of ARM/ARM64 + UEFI without device tree.

That said, no the comment is wrong.  It's not "since 11/2018" but "since
the 1990s".  And it doesn't provide any sort of link / context to the
boot loader specification project or similar that explains the cases
when a non-filename "initrd=" would reasonably (or unreasonably but
happens in reality) be removed.

I would go so far as to suggest that adding special handling for some
x86 setups is the wrong to place to start / further deprecate how other
architectures and firmwares handle a given situation.  I'm only chiming
in here as I saw this commit go by on LWN and wanted to see how this was
different from the traditional usage of initrd= in the rest of the
kernel (it's not) and then saw the otherwise unrelated new comment being
added.

-- 
Tom

signature.asc
Description: PGP signature

Re: [PATCH v6 17/19] mm: memcg/slab: use a single set of kmem_caches for all allocations

2020-06-22 Thread Roman Gushchin

On Mon, Jun 22, 2020 at 12:21:28PM -0700, Shakeel Butt wrote:
> On Mon, Jun 8, 2020 at 4:07 PM Roman Gushchin  wrote:
> >
> > Instead of having two sets of kmem_caches: one for system-wide and
> > non-accounted allocations and the second one shared by all accounted
> > allocations, we can use just one.
> >
> > The idea is simple: space for obj_cgroup metadata can be allocated
> > on demand and filled only for accounted allocations.
> >
> > It allows to remove a bunch of code which is required to handle
> > kmem_cache clones for accounted allocations. There is no more need
> > to create them, accumulate statistics, propagate attributes, etc.
> > It's a quite significant simplification.
> >
> > Also, because the total number of slab_caches is reduced almost twice
> > (not all kmem_caches have a memcg clone), some additional memory
> > savings are expected. On my devvm it additionally saves about 3.5%
> > of slab memory.
> >
> > Suggested-by: Johannes Weiner 
> > Signed-off-by: Roman Gushchin 
> > Reviewed-by: Vlastimil Babka 
> > ---
> [snip]
> >  static inline void memcg_slab_post_alloc_hook(struct kmem_cache *s,
> >   struct obj_cgroup *objcg,
> > - size_t size, void **p)
> > + gfp_t flags, size_t size,
> > + void **p)
> >  {
> > struct page *page;
> > unsigned long off;
> > size_t i;
> >
> > +   if (!objcg)
> > +   return;
> > +
> > +   flags &= ~__GFP_ACCOUNT;
> > for (i = 0; i < size; i++) {
> > if (likely(p[i])) {
> > page = virt_to_head_page(p[i]);
> > +
> > +   if (!page_has_obj_cgroups(page) &&
> 
> The page is already linked into the kmem_cache, don't you need
> synchronization for memcg_alloc_page_obj_cgroups().

Hm, yes, in theory we need it. I guess the reason behind why I've never seen 
any issues
here is the SLUB percpu partial list.

So in theory we need something like:

diff --git a/mm/slab.h b/mm/slab.h
index 0a31600a0f5c..44bf57815816 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -237,7 +237,10 @@ static inline int memcg_alloc_page_obj_cgroups(struct page 
*page,
if (!vec)
return -ENOMEM;
 
-   page->obj_cgroups = (struct obj_cgroup **) ((unsigned long)vec | 0x1UL);
+   if (cmpxchg(>obj_cgroups, 0,
+   (struct obj_cgroup **) ((unsigned long)vec | 0x1UL)))
+   kfree(vec);
+
return 0;
 }


But I wonder if we might put it under #ifdef CONFIG_SLAB?
Or any other ideas how to make it less expensive?

> What's the reason to remove this from charge_slab_page()?

Because at charge_slab_page() we don't know if we'll ever need
page->obj_cgroups. Some caches might have only few or even zero
accounted objects.

> 
> > +   memcg_alloc_page_obj_cgroups(page, s, flags)) {
> > +   obj_cgroup_uncharge(objcg, 
> > obj_full_size(s));
> > +   continue;
> > +   }
> > +
> > off = obj_to_index(s, page, p[i]);
> > obj_cgroup_get(objcg);
> > page_obj_cgroups(page)[off] = objcg;

Re: [PATCH 5.7] x86/crypto: aesni: Fix build with LLVM_IAS=1

2020-06-22 Thread Nick Desaulniers

On Mon, Jun 22, 2020 at 8:50 AM Sedat Dilek  wrote:
>
> When building with LLVM_IAS=1 means using Clang's Integrated Assembly (IAS)
> from LLVM/Clang >= v10.0.1-rc1+ instead of GNU/as from GNU/binutils
> I see the following breakage in Debian/testing AMD64:
>
> :15:74: error: too many positional arguments
>  PRECOMPUTE 8*3+8(%rsp), %xmm1, %xmm2, %xmm3, %xmm4, %xmm5, %xmm6, %xmm7,
>  ^
>  arch/x86/crypto/aesni-intel_asm.S:1598:2: note: while in macro instantiation
>  GCM_INIT %r9, 8*3 +8(%rsp), 8*3 +16(%rsp), 8*3 +24(%rsp)
>  ^
> :47:2: error: unknown use of instruction mnemonic without a 
> size suffix
>  GHASH_4_ENCRYPT_4_PARALLEL_dec %xmm9, %xmm10, %xmm11, %xmm12, %xmm13, 
> %xmm14, %xmm0, %xmm1, %xmm2, %xmm3, %xmm4, %xmm5, %xmm6, %xmm7, %xmm8, enc
>  ^
> arch/x86/crypto/aesni-intel_asm.S:1599:2: note: while in macro instantiation
>  GCM_ENC_DEC dec
>  ^
> :15:74: error: too many positional arguments
>  PRECOMPUTE 8*3+8(%rsp), %xmm1, %xmm2, %xmm3, %xmm4, %xmm5, %xmm6, %xmm7,
>  ^
> arch/x86/crypto/aesni-intel_asm.S:1686:2: note: while in macro instantiation
>  GCM_INIT %r9, 8*3 +8(%rsp), 8*3 +16(%rsp), 8*3 +24(%rsp)
>  ^
> :47:2: error: unknown use of instruction mnemonic without a 
> size suffix
>  GHASH_4_ENCRYPT_4_PARALLEL_enc %xmm9, %xmm10, %xmm11, %xmm12, %xmm13, 
> %xmm14, %xmm0, %xmm1, %xmm2, %xmm3, %xmm4, %xmm5, %xmm6, %xmm7, %xmm8, enc
>  ^
> arch/x86/crypto/aesni-intel_asm.S:1687:2: note: while in macro instantiation
>  GCM_ENC_DEC enc

=== I think from here to...


>
> Craig Topper suggested me in ClangBuiltLinux issue #1050:
>
> > I think the "too many positional arguments" is because the parser isn't able
> > to handle the trailing commas.
> >
> > The "unknown use of instruction mnemonic" is because the macro was named
> > GHASH_4_ENCRYPT_4_PARALLEL_DEC but its being instantiated with
> > GHASH_4_ENCRYPT_4_PARALLEL_dec I guess gas ignores case on the
> > macro instantiation, but llvm doesn't.

Yep, see also:
commit 6f5459da2b87 ("arm64: alternative: fix build with clang
integrated assembler")
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=6f5459da2b8736720afdbd67c4bd2d1edba7d0e3

>
> First, I removed the trailing comma in the PRECOMPUTE line.
>
> Second, I substituted:
> 1. GCM_ENC_DEC dec -> GCM_ENC_DEC DEC
> 2. GCM_ENC_DEC enc -> GCM_ENC_DEC ENC
>
> With these changes I was able to build with LLVM_IAS=1 and boot on bare metal.
>
> As llvm-toolchain I used v10.0.1-rc1+ and v11.0.0-git pre-releases:
> 1. release/10.x Git: 2dc664d578f0e9c8ea5975eed745e322fa77bffe
> 2.   master Git: 8da5b9083691b557f50f72ab099598bb291aec5f (default)
>
> Just for the sake of completeness:
> 1. CONFIG_DEBUG_INFO_DWARF4=y
> 2. OBJDUMP=llvm-objdump (passed to my make-line)
>
> Please have a look into "llvm.rst" kernel-doc for further informations and
> how to pass LLVM kbuild-options to your make-line.
>
> I confirmed that this works with Linux-kernel v5.7.3 and v5.7.5 final.
>
> NOTE: This patch is on top of Linux v5.7 final.
>
> Thanks to Craig and the folks from the ClangBuiltLinux project.

===...here can be cut out from the commit message.

>
> Cc: Craig Topper 
> Cc: Craig Topper 

I'd pick one or the other email addresses, and just use that one.
Craig seems to commit to LLVM with craig.top...@intel.com, so I
recommend that one.

> Cc: Nick Desaulniers ndesaulni...@google.com

Thanks for the explicit CC, though I do monitor the below list actively.

> Cc: "ClangBuiltLinux" 
> Link: https://github.com/ClangBuiltLinux/linux/issues/1050
> Link: 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/kbuild/llvm.rst

^ probably don't need that link either.

>
> ---
>  arch/x86/crypto/aesni-intel_asm.S | 10 +-
>  1 file changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/arch/x86/crypto/aesni-intel_asm.S 
> b/arch/x86/crypto/aesni-intel_asm.S
> index cad6e1bfa7d5..983eb2eec51a 100644
> --- a/arch/x86/crypto/aesni-intel_asm.S
> +++ b/arch/x86/crypto/aesni-intel_asm.S
> @@ -266,7 +266,7 @@ ALL_F:  .octa 0x
> PSHUFB_XMM %xmm2, %xmm0
> movdqu %xmm0, CurCount(%arg2) # ctx_data.current_counter = iv
>
> -   PRECOMPUTE \SUBKEY, %xmm1, %xmm2, %xmm3, %xmm4, %xmm5, %xmm6, %xmm7,
> +   PRECOMPUTE \SUBKEY, %xmm1, %xmm2, %xmm3, %xmm4, %xmm5, %xmm6, %xmm7
> movdqu HashKey(%arg2), %xmm13
>
> CALC_AAD_HASH %xmm13, \AAD, \AADLEN, %xmm0, %xmm1, %xmm2, %xmm3, \


There's a comparison on L386
 386 .ifc \operation, dec
Also, L407, L393, L672, L808, L841, L935, L941, L947,

If we change the `\operation` macro parameter to be `DEC` instead of
`dec`, does this comparison still hold true?  I would expect not if
LLVM's integrated assembler is case sensitive?  Otherwise we're
probably missing instructions for the case of `DEC`.  In that case,

[PATCH] drm/radeon: fix fb_div check in ni_init_smc_spll_table()

2020-06-22 Thread Denis Efremov

clk_s is checked twice in a row in ni_init_smc_spll_table().
fb_div should be checked instead.

Fixes: 69e0b57a91ad ("drm/radeon/kms: add dpm support for cayman (v5)")
Cc: sta...@vger.kernel.org
Signed-off-by: Denis Efremov 
---
 drivers/gpu/drm/radeon/ni_dpm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/radeon/ni_dpm.c b/drivers/gpu/drm/radeon/ni_dpm.c
index b57c37ddd164..c7fbb7932f37 100644
--- a/drivers/gpu/drm/radeon/ni_dpm.c
+++ b/drivers/gpu/drm/radeon/ni_dpm.c
@@ -2127,7 +2127,7 @@ static int ni_init_smc_spll_table(struct radeon_device 
*rdev)
if (clk_s & ~(SMC_NISLANDS_SPLL_DIV_TABLE_CLKS_MASK >> 
SMC_NISLANDS_SPLL_DIV_TABLE_CLKS_SHIFT))
ret = -EINVAL;
 
-   if (clk_s & ~(SMC_NISLANDS_SPLL_DIV_TABLE_CLKS_MASK >> 
SMC_NISLANDS_SPLL_DIV_TABLE_CLKS_SHIFT))
+   if (fb_div & ~(SMC_NISLANDS_SPLL_DIV_TABLE_FBDIV_MASK >> 
SMC_NISLANDS_SPLL_DIV_TABLE_FBDIV_SHIFT))
ret = -EINVAL;
 
if (clk_v & ~(SMC_NISLANDS_SPLL_DIV_TABLE_CLKV_MASK >> 
SMC_NISLANDS_SPLL_DIV_TABLE_CLKV_SHIFT))
-- 
2.26.2

Re: [PATCH] trivial: fix kerneldoc comments

2020-06-22 Thread Joe Perches

On Mon, 2020-06-22 at 21:37 +0200, Julia Lawall wrote:
> Fix the parameter names in the comment to correspond to those in the
> function header.
> 
> Drop comments about return values when there is no return value.

Done by hand or script?

[]
> diff --git a/arch/mips/cavium-octeon/executive/cvmx-spi.c 
> b/arch/mips/cavium-octeon/executive/cvmx-spi.c
[]
> @@ -69,9 +69,7 @@ static cvmx_spi_callbacks_t cvmx_spi_callbacks = {
>  /**
>   * Get current SPI4 initialization callbacks
>   *
> - * @callbacks:   Pointer to the callbacks structure.to fill
> - *
> - * Returns Pointer to cvmx_spi_callbacks_t structure.
> + * @callbacks:   Pointer to the callbacks structure, to fill.

If scripted, odd comma after structure

> diff --git a/drivers/crypto/bcm/spu.c b/drivers/crypto/bcm/spu.c
[]
> @@ -519,7 +519,7 @@ u32 spum_assoc_resp_len(enum spu_cipher_mode cipher_mode,
>   * spu_aead_ivlen() - Calculate the length of the AEAD IV to be included
>   * in a SPU request after the AAD and before the payload.
>   * @cipher_mode:  cipher mode
> - * @iv_ctr_len:   initialization vector length in bytes
> + * @iv_len:   initialization vector length in bytes
>   *
>   * In Linux ~4.2 and later, the assoc_data sg includes the IV. So no need
>   * to include the IV as a separate field in the SPU request msg.
> @@ -917,7 +917,7 @@ u16 spum_cipher_req_init(u8 *spu_hdr, struct 
> spu_cipher_parms *cipher_parms)
>   * setkey() time in spu_cipher_req_init().
>   * @spu_hdr: Start of the request message header (MH field)
>   * @spu_req_hdr_len: Length in bytes of the SPU request header
> - * @isInbound:   0 encrypt, 1 decrypt
> + * @is_inbound:   0 encrypt, 1 decrypt

odd alignments

etc...

Re: [PATCH v3] venus: fix multiple encoder crash

2020-06-22 Thread Doug Anderson

Hi,

On Mon, Jun 22, 2020 at 5:16 AM Stanimir Varbanov
 wrote:
>
> From: Mansur Alisha Shaik 
>
> Currently we are considering the instances which are available
> in core->inst list for load calculation in min_loaded_core()
> function, but this is incorrect because by the time we call
> decide_core() for second instance, the third instance not
> filled yet codec_freq_data pointer.
>
> Solve this by considering the instances whose session has started.
>
> Cc: sta...@vger.kernel.org # v5.7+
> Fixes: 4ebf969375bc ("media: venus: introduce core selection")
> Signed-off-by: Mansur Alisha Shaik 
> Signed-off-by: Stanimir Varbanov 
> ---
>
> v3: Cc stable and add Fixes tag.
>
>  drivers/media/platform/qcom/venus/pm_helpers.c | 4 
>  1 file changed, 4 insertions(+)

The code is the same, so carrying over my tested tag [1]:

Tested-by: Douglas Anderson 

[1] 
https://lore.kernel.org/r/CAD=FV=vt8je1att8id-rpc3jtof_7ugkpc-uduspzckwi3e...@mail.gmail.com/

[PATCH] crypto: ccp - Fix use of merged scatterlists

2020-06-22 Thread John Allen

Running the crypto manager self tests with
CONFIG_CRYPTO_MANAGER_EXTRA_TESTS may result in several types of errors
when using the ccp-crypto driver:

alg: skcipher: cbc-des3-ccp encryption failed on test vector 0; 
expected_error=0, actual_error=-5 ...

alg: skcipher: ctr-aes-ccp decryption overran dst buffer on test vector 0 ...

alg: ahash: sha224-ccp test failed (wrong result) on test vector ...

These errors are the result of improper processing of scatterlists mapped
for DMA.

Given a scatterlist in which entries are merged as part of mapping the
scatterlist for DMA, the DMA length of a merged entry will reflect the
combined length of the entries that were merged. The subsequent
scatterlist entry will contain DMA information for the scatterlist entry
after the last merged entry, but the non-DMA information will be that of
the first merged entry.

The ccp driver does not take this scatterlist merging into account. To
address this, add a second scatterlist pointer to track the current
position in the DMA mapped representation of the scatterlist. Both the DMA
representation and the original representation of the scatterlist must be
tracked as while most of the driver can use just the DMA representation,
scatterlist_map_and_copy() must use the original representation and
expects the scatterlist pointer to be accurate to the original
representation.

In order to properly walk the original scatterlist, the scatterlist must
be walked until the combined lengths of the entries seen is equal to the
DMA length of the current entry being processed in the DMA mapped
representation.

Fixes: 63b945091a070 ("crypto: ccp - CCP device driver and interface support")
Signed-off-by: John Allen 
Cc: sta...@vger.kernel.org
---
 drivers/crypto/ccp/ccp-dev.h |  1 +
 drivers/crypto/ccp/ccp-ops.c | 37 +---
 2 files changed, 27 insertions(+), 11 deletions(-)

diff --git a/drivers/crypto/ccp/ccp-dev.h b/drivers/crypto/ccp/ccp-dev.h
index 3f68262d9ab4..87a34d91fdf7 100644
--- a/drivers/crypto/ccp/ccp-dev.h
+++ b/drivers/crypto/ccp/ccp-dev.h
@@ -469,6 +469,7 @@ struct ccp_sg_workarea {
unsigned int sg_used;
 
struct scatterlist *dma_sg;
+   struct scatterlist *dma_sg_head;
struct device *dma_dev;
unsigned int dma_count;
enum dma_data_direction dma_dir;
diff --git a/drivers/crypto/ccp/ccp-ops.c b/drivers/crypto/ccp/ccp-ops.c
index 422193690fd4..64112c736810 100644
--- a/drivers/crypto/ccp/ccp-ops.c
+++ b/drivers/crypto/ccp/ccp-ops.c
@@ -63,7 +63,7 @@ static u32 ccp_gen_jobid(struct ccp_device *ccp)
 static void ccp_sg_free(struct ccp_sg_workarea *wa)
 {
if (wa->dma_count)
-   dma_unmap_sg(wa->dma_dev, wa->dma_sg, wa->nents, wa->dma_dir);
+   dma_unmap_sg(wa->dma_dev, wa->dma_sg_head, wa->nents, 
wa->dma_dir);
 
wa->dma_count = 0;
 }
@@ -92,6 +92,7 @@ static int ccp_init_sg_workarea(struct ccp_sg_workarea *wa, 
struct device *dev,
return 0;
 
wa->dma_sg = sg;
+   wa->dma_sg_head = sg;
wa->dma_dev = dev;
wa->dma_dir = dma_dir;
wa->dma_count = dma_map_sg(dev, sg, wa->nents, dma_dir);
@@ -104,14 +105,28 @@ static int ccp_init_sg_workarea(struct ccp_sg_workarea 
*wa, struct device *dev,
 static void ccp_update_sg_workarea(struct ccp_sg_workarea *wa, unsigned int 
len)
 {
unsigned int nbytes = min_t(u64, len, wa->bytes_left);
+   unsigned int sg_combined_len = 0;
 
if (!wa->sg)
return;
 
wa->sg_used += nbytes;
wa->bytes_left -= nbytes;
-   if (wa->sg_used == wa->sg->length) {
-   wa->sg = sg_next(wa->sg);
+   if (wa->sg_used == sg_dma_len(wa->dma_sg)) {
+   /* Advance to the next DMA scatterlist entry */
+   wa->dma_sg = sg_next(wa->dma_sg);
+
+   /* In the case that the DMA mapped scatterlist has entries
+* that have been merged, the non-DMA mapped scatterlist
+* must be advanced multiple times for each merged entry.
+* This ensures that the current non-DMA mapped entry
+* corresponds to the current DMA mapped entry.
+*/
+   do {
+   sg_combined_len += wa->sg->length;
+   wa->sg = sg_next(wa->sg);
+   } while (wa->sg_used > sg_combined_len);
+
wa->sg_used = 0;
}
 }
@@ -299,7 +314,7 @@ static unsigned int ccp_queue_buf(struct ccp_data *data, 
unsigned int from)
/* Update the structures and generate the count */
buf_count = 0;
while (sg_wa->bytes_left && (buf_count < dm_wa->length)) {
-   nbytes = min(sg_wa->sg->length - sg_wa->sg_used,
+   nbytes = min(sg_dma_len(sg_wa->dma_sg) - sg_wa->sg_used,
 dm_wa->length - buf_count);
nbytes = min_t(u64, sg_wa->bytes_left, nbytes);
 
@@ -331,11 +346,11 @@ static void

Re: [PATCH v3 2/6] remoteproc: k3: Add TI-SCI processor control helper functions

2020-06-22 Thread Suman Anna


Hi Mathieu,

On 6/22/20 12:35 PM, Mathieu Poirier wrote:

Hi Suman,

Apologies for the late reply, this one slipped through the cracks...


No problem :)




On Fri, Jun 12, 2020 at 05:49:10PM -0500, Suman Anna wrote:

Texas Instruments' K3 generation SoCs have specific modules/register
spaces used for configuring the various aspects of a remote processor.
These include power, reset, boot vector and other configuration features
specific to each compute processor present on the SoC. These registers
are managed by the System Controller such as DMSC on K3 AM65x SoCs.

The Texas Instrument's System Control Interface (TI-SCI) Message Protocol
is used to communicate to the System Controller from various compute
processors to invoke specific services provided by the firmware running
on the System Controller.

Add a common processor control interface header file that can be used by
multiple remoteproc drivers. The helper functions within this header file
abstract the various TI SCI protocol ops for the remoteproc drivers, and
allow them to request the System Controller to be able to program and
manage various remote processors on the SoC. The remoteproc drivers are
expected to manage the life-cycle of their ti_sci_proc_dev local
structures.

Signed-off-by: Suman Anna 
---
v3: New to this series, but the patch is identical to the one from the
 K3 R5F series posted previously, with patch title adjusted
 https://patchwork.kernel.org/patch/11456379/

  drivers/remoteproc/ti_sci_proc.h | 102 +++
  1 file changed, 102 insertions(+)
  create mode 100644 drivers/remoteproc/ti_sci_proc.h

diff --git a/drivers/remoteproc/ti_sci_proc.h b/drivers/remoteproc/ti_sci_proc.h
new file mode 100644
index ..e42d8015b8e7
--- /dev/null
+++ b/drivers/remoteproc/ti_sci_proc.h
@@ -0,0 +1,102 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Texas Instruments TI-SCI Processor Controller Helper Functions
+ *
+ * Copyright (C) 2018-2020 Texas Instruments Incorporated - http://www.ti.com/
+ * Suman Anna
+ */
+
+#ifndef REMOTEPROC_TI_SCI_PROC_H
+#define REMOTEPROC_TI_SCI_PROC_H
+
+/**
+ * struct ti_sci_proc - structure representing a processor control client
+ * @sci: cached TI-SCI protocol handle
+ * @ops: cached TI-SCI proc ops
+ * @dev: cached client device pointer
+ * @proc_id: processor id for the consumer remoteproc device
+ * @host_id: host id to pass the control over for this consumer remoteproc
+ *  device
+ */
+struct ti_sci_proc {
+   const struct ti_sci_handle *sci;
+   const struct ti_sci_proc_ops *ops;
+   struct device *dev;


Please include the proper header files for the above structures.  


OK, I will move the #include  from the 
driver source files to here.


I would also

have expected the name of the structure to be ti_sci_rproc but that choice is
entirely your.


This follows the terminology used in the TI SCI protocol and firmware 
code. I will leave it unchanged.




With the proper header files included:

Reviewed-by: Mathieu Poirier 


Thanks, I will await any comments from Rob on the bindings patch before 
I refresh this series.


regards
Suman




+   u8 proc_id;
+   u8 host_id;
+};
+
+static inline int ti_sci_proc_request(struct ti_sci_proc *tsp)
+{
+   int ret;
+
+   ret = tsp->ops->request(tsp->sci, tsp->proc_id);
+   if (ret)
+   dev_err(tsp->dev, "ti-sci processor request failed: %d\n",
+   ret);
+   return ret;
+}
+
+static inline int ti_sci_proc_release(struct ti_sci_proc *tsp)
+{
+   int ret;
+
+   ret = tsp->ops->release(tsp->sci, tsp->proc_id);
+   if (ret)
+   dev_err(tsp->dev, "ti-sci processor release failed: %d\n",
+   ret);
+   return ret;
+}
+
+static inline int ti_sci_proc_handover(struct ti_sci_proc *tsp)
+{
+   int ret;
+
+   ret = tsp->ops->handover(tsp->sci, tsp->proc_id, tsp->host_id);
+   if (ret)
+   dev_err(tsp->dev, "ti-sci processor handover of %d to %d failed: 
%d\n",
+   tsp->proc_id, tsp->host_id, ret);
+   return ret;
+}
+
+static inline int ti_sci_proc_set_config(struct ti_sci_proc *tsp,
+u64 boot_vector,
+u32 cfg_set, u32 cfg_clr)
+{
+   int ret;
+
+   ret = tsp->ops->set_config(tsp->sci, tsp->proc_id, boot_vector,
+  cfg_set, cfg_clr);
+   if (ret)
+   dev_err(tsp->dev, "ti-sci processor set_config failed: %d\n",
+   ret);
+   return ret;
+}
+
+static inline int ti_sci_proc_set_control(struct ti_sci_proc *tsp,
+ u32 ctrl_set, u32 ctrl_clr)
+{
+   int ret;
+
+   ret = tsp->ops->set_control(tsp->sci, tsp->proc_id, ctrl_set, ctrl_clr);
+   if (ret)
+   dev_err(tsp->dev, "ti-sci processor set_control failed: %d\n",
+

Re: [PATCH] initrd: Remove erroneous comment

2020-06-22 Thread hpa

On June 19, 2020 5:03:33 PM PDT, ron minnich  wrote:
>It seems fine to me, but I did not initially object to the use of that
>name anyway. hpa, what do you think?
>
>On Fri, Jun 19, 2020 at 7:31 AM Tom Rini  wrote:
>>
>> Most architectures have been passing the location of an initrd via
>the
>> initrd= option since their inception.  Remove the comment as it's
>both
>> wrong and unrelated to the commit that introduced it.
>>
>> Fixes: 694cfd87b0c8 ("x86/setup: Add an initrdmem= option to specify
>initrd physical address")
>> Cc: Andrew Morton 
>> Cc: Borislav Petkov 
>> Cc: Dominik Brodowski 
>> Cc: H. Peter Anvin (Intel) 
>> Cc: Ronald G. Minnich 
>> Signed-off-by: Tom Rini 
>> ---
>> For a bit more context, I assume there's been some confusion between
>> "initrd" being a keyword in things like extlinux.conf and also that
>for
>> quite a long time now initrd information is passed via device tree
>and
>> not the command line on relevant architectures.  But it's still true
>> that it's been a valid command line option to the kernel since the
>90s.
>> It's just the case that in 2018 the code was consolidated from under
>> arch/ and in to this file.
>> ---
>>  init/do_mounts_initrd.c | 5 -
>>  1 file changed, 5 deletions(-)
>>
>> diff --git a/init/do_mounts_initrd.c b/init/do_mounts_initrd.c
>> index d72beda824aa..53314d7da4be 100644
>> --- a/init/do_mounts_initrd.c
>> +++ b/init/do_mounts_initrd.c
>> @@ -45,11 +45,6 @@ static int __init early_initrdmem(char *p)
>>  }
>>  early_param("initrdmem", early_initrdmem);
>>
>> -/*
>> - * This is here as the initrd keyword has been in use since 11/2018
>> - * on ARM, PowerPC, and MIPS.
>> - * It should not be; it is reserved for bootloaders.
>> - */
>>  static int __init early_initrd(char *p)
>>  {
>> return early_initrdmem(p);
>> --
>> 2.17.1
>>

Well, I observe that it was documented as reserved for bootloaders since the 
mid-90s at least.
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

RE: [Intel-wired-lan] [PATCH][next] ice: Use struct_size() helper

2020-06-22 Thread Allan, Bruce W

> -Original Message-
> From: Intel-wired-lan  On Behalf Of
> Gustavo A. R. Silva
> Sent: Friday, June 19, 2020 10:56 AM
> To: Kirsher, Jeffrey T ; David S. Miller
> ; Jakub Kicinski 
> Cc: net...@vger.kernel.org; intel-wired-...@lists.osuosl.org; linux-
> ker...@vger.kernel.org
> Subject: [Intel-wired-lan] [PATCH][next] ice: Use struct_size() helper
> 
> Make use of the struct_size() helper instead of an open-coded version
> in order to avoid any potential type mistakes.
> 
> This code was detected with the help of Coccinelle and, audited and
> fixed manually.
> 
> Addresses-KSPP-ID: https://github.com/KSPP/linux/issues/83
> Signed-off-by: Gustavo A. R. Silva 
> ---
>  drivers/net/ethernet/intel/ice/ice_flex_pipe.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)

This is already fixed in an in-process patch-set that converts one-element 
arrays to flexible-arrays
that Jeff Kirsher has mentioned before and should be pushed shortly.

> 
> diff --git a/drivers/net/ethernet/intel/ice/ice_flex_pipe.c
> b/drivers/net/ethernet/intel/ice/ice_flex_pipe.c
> index 4420fc02f7e7..d92c4d70dbcd 100644
> --- a/drivers/net/ethernet/intel/ice/ice_flex_pipe.c
> +++ b/drivers/net/ethernet/intel/ice/ice_flex_pipe.c
> @@ -1121,8 +1121,7 @@ static enum ice_status ice_get_pkg_info(struct
> ice_hw *hw)
>   u16 size;
>   u32 i;
> 
> - size = sizeof(*pkg_info) + (sizeof(pkg_info->pkg_info[0]) *
> - (ICE_PKG_CNT - 1));
> + size = struct_size(pkg_info, pkg_info, ICE_PKG_CNT - 1);
>   pkg_info = kzalloc(size, GFP_KERNEL);
>   if (!pkg_info)
>   return ICE_ERR_NO_MEMORY;
> --
> 2.27.0
> 
> ___
> Intel-wired-lan mailing list
> intel-wired-...@osuosl.org
> https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

[PATCH] dma-remap: Align the size in dma_common_*_remap()

2020-06-22 Thread Eric Auger

Running a guest with a virtio-iommu protecting virtio devices
is broken since commit 515e5b6d90d4 ("dma-mapping: use vmap insted
of reimplementing it"). Before the conversion, the size was
page aligned in __get_vm_area_node(). Doing so fixes the
regression.

Fixes: 515e5b6d90d4 ("dma-mapping: use vmap insted of reimplementing it")
Signed-off-by: Eric Auger 
---
 kernel/dma/remap.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/dma/remap.c b/kernel/dma/remap.c
index e739a6eea6e7..a3151a9b0c08 100644
--- a/kernel/dma/remap.c
+++ b/kernel/dma/remap.c
@@ -24,7 +24,7 @@ void *dma_common_pages_remap(struct page **pages, size_t size,
 {
void *vaddr;
 
-   vaddr = vmap(pages, size >> PAGE_SHIFT, VM_DMA_COHERENT, prot);
+   vaddr = vmap(pages, PAGE_ALIGN(size) >> PAGE_SHIFT, VM_DMA_COHERENT, 
prot);
if (vaddr)
find_vm_area(vaddr)->pages = pages;
return vaddr;
@@ -37,7 +37,7 @@ void *dma_common_pages_remap(struct page **pages, size_t size,
 void *dma_common_contiguous_remap(struct page *page, size_t size,
pgprot_t prot, const void *caller)
 {
-   int count = size >> PAGE_SHIFT;
+   int count = PAGE_ALIGN(size) >> PAGE_SHIFT;
struct page **pages;
void *vaddr;
int i;
-- 
2.20.1

[PATCH 1/6] KVM: x86/mmu: Move mmu_audit.c and mmutrace.h into the mmu/ sub-directory

2020-06-22 Thread Sean Christopherson

Move mmu_audit.c and mmutrace.h under mmu/ where they belong.

Signed-off-by: Sean Christopherson 
---
 arch/x86/kvm/{ => mmu}/mmu_audit.c | 0
 arch/x86/kvm/{ => mmu}/mmutrace.h  | 2 +-
 2 files changed, 1 insertion(+), 1 deletion(-)
 rename arch/x86/kvm/{ => mmu}/mmu_audit.c (100%)
 rename arch/x86/kvm/{ => mmu}/mmutrace.h (99%)

diff --git a/arch/x86/kvm/mmu_audit.c b/arch/x86/kvm/mmu/mmu_audit.c
similarity index 100%
rename from arch/x86/kvm/mmu_audit.c
rename to arch/x86/kvm/mmu/mmu_audit.c
diff --git a/arch/x86/kvm/mmutrace.h b/arch/x86/kvm/mmu/mmutrace.h
similarity index 99%
rename from arch/x86/kvm/mmutrace.h
rename to arch/x86/kvm/mmu/mmutrace.h
index ffcd96fc02d0..9d15bc0c535b 100644
--- a/arch/x86/kvm/mmutrace.h
+++ b/arch/x86/kvm/mmu/mmutrace.h
@@ -387,7 +387,7 @@ TRACE_EVENT(
 #endif /* _TRACE_KVMMMU_H */
 
 #undef TRACE_INCLUDE_PATH
-#define TRACE_INCLUDE_PATH .
+#define TRACE_INCLUDE_PATH mmu
 #undef TRACE_INCLUDE_FILE
 #define TRACE_INCLUDE_FILE mmutrace
 
-- 
2.26.0

[PATCH 3/6] KVM: x86/mmu: Add MMU-internal header

2020-06-22 Thread Sean Christopherson

Add mmu/mmu_internal.h to hold declarations and definitions that need
to be shared between various mmu/ files, but should not be used by
anything outside of the MMU.

Begin populating mmu_internal.h with declarations of the helpers used by
page_track.c.

Signed-off-by: Sean Christopherson 
---
 arch/x86/kvm/mmu.h  |  4 
 arch/x86/kvm/mmu/mmu.c  |  1 +
 arch/x86/kvm/mmu/mmu_internal.h | 10 ++
 arch/x86/kvm/mmu/page_track.c   |  2 +-
 4 files changed, 12 insertions(+), 5 deletions(-)
 create mode 100644 arch/x86/kvm/mmu/mmu_internal.h

diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index d46944488e72..434acfcbf710 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -209,10 +209,6 @@ static inline u8 permission_fault(struct kvm_vcpu *vcpu, 
struct kvm_mmu *mmu,
 
 void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end);
 
-void kvm_mmu_gfn_disallow_lpage(struct kvm_memory_slot *slot, gfn_t gfn);
-void kvm_mmu_gfn_allow_lpage(struct kvm_memory_slot *slot, gfn_t gfn);
-bool kvm_mmu_slot_gfn_write_protect(struct kvm *kvm,
-   struct kvm_memory_slot *slot, u64 gfn);
 int kvm_arch_write_log_dirty(struct kvm_vcpu *vcpu);
 
 int kvm_mmu_post_init_vm(struct kvm *kvm);
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 1b4d45b8f462..c1bf30e24bfc 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -18,6 +18,7 @@
 #include "irq.h"
 #include "ioapic.h"
 #include "mmu.h"
+#include "mmu_internal.h"
 #include "x86.h"
 #include "kvm_cache_regs.h"
 #include "kvm_emulate.h"
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
new file mode 100644
index ..d7938c37c7de
--- /dev/null
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -0,0 +1,10 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __KVM_X86_MMU_INTERNAL_H
+#define __KVM_X86_MMU_INTERNAL_H
+
+void kvm_mmu_gfn_disallow_lpage(struct kvm_memory_slot *slot, gfn_t gfn);
+void kvm_mmu_gfn_allow_lpage(struct kvm_memory_slot *slot, gfn_t gfn);
+bool kvm_mmu_slot_gfn_write_protect(struct kvm *kvm,
+   struct kvm_memory_slot *slot, u64 gfn);
+
+#endif /* __KVM_X86_MMU_INTERNAL_H */
diff --git a/arch/x86/kvm/mmu/page_track.c b/arch/x86/kvm/mmu/page_track.c
index a7bcde34d1f2..a84a141a2ad2 100644
--- a/arch/x86/kvm/mmu/page_track.c
+++ b/arch/x86/kvm/mmu/page_track.c
@@ -16,7 +16,7 @@
 
 #include 
 
-#include "mmu.h"
+#include "mmu_internal.h"
 
 void kvm_page_track_free_memslot(struct kvm_memory_slot *slot)
 {
-- 
2.26.0

[PATCH 4/6] KVM: x86/mmu: Make kvm_mmu_page definition and accessor internal-only

2020-06-22 Thread Sean Christopherson

Make 'struct kvm_mmu_page' MMU-only, nothing outside of the MMU should
be poking into the gory details of shadow pages.

Signed-off-by: Sean Christopherson 
---
 arch/x86/include/asm/kvm_host.h | 46 ++-
 arch/x86/kvm/mmu/mmu_internal.h | 48 +
 2 files changed, 50 insertions(+), 44 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index f8998e97457f..86933c467a1e 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -322,43 +322,6 @@ struct kvm_rmap_head {
unsigned long val;
 };
 
-struct kvm_mmu_page {
-   struct list_head link;
-   struct hlist_node hash_link;
-   struct list_head lpage_disallowed_link;
-
-   bool unsync;
-   u8 mmu_valid_gen;
-   bool mmio_cached;
-   bool lpage_disallowed; /* Can't be replaced by an equiv large page */
-
-   /*
-* The following two entries are used to key the shadow page in the
-* hash table.
-*/
-   union kvm_mmu_page_role role;
-   gfn_t gfn;
-
-   u64 *spt;
-   /* hold the gfn of each spte inside spt */
-   gfn_t *gfns;
-   int root_count;  /* Currently serving as active root */
-   unsigned int unsync_children;
-   struct kvm_rmap_head parent_ptes; /* rmap pointers to parent sptes */
-   DECLARE_BITMAP(unsync_child_bitmap, 512);
-
-#ifdef CONFIG_X86_32
-   /*
-* Used out of the mmu-lock to avoid reading spte values while an
-* update is in progress; see the comments in __get_spte_lockless().
-*/
-   int clear_spte_count;
-#endif
-
-   /* Number of writes since the last time traversal visited this page.  */
-   atomic_t write_flooding_count;
-};
-
 struct kvm_pio_request {
unsigned long linear_rip;
unsigned long count;
@@ -384,6 +347,8 @@ struct kvm_mmu_root_info {
 
 #define KVM_MMU_NUM_PREV_ROOTS 3
 
+struct kvm_mmu_page;
+
 /*
  * x86 supports 4 paging modes (5-level 64-bit, 4-level 64-bit, 3-level 32-bit,
  * and 2-level 32-bit).  The kvm_mmu structure abstracts the details of the
@@ -1557,13 +1522,6 @@ static inline gpa_t translate_gpa(struct kvm_vcpu *vcpu, 
gpa_t gpa, u32 access,
return gpa;
 }
 
-static inline struct kvm_mmu_page *page_header(hpa_t shadow_page)
-{
-   struct page *page = pfn_to_page(shadow_page >> PAGE_SHIFT);
-
-   return (struct kvm_mmu_page *)page_private(page);
-}
-
 static inline u16 kvm_read_ldt(void)
 {
u16 ldt;
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index d7938c37c7de..8afa60f0a1a5 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -2,6 +2,54 @@
 #ifndef __KVM_X86_MMU_INTERNAL_H
 #define __KVM_X86_MMU_INTERNAL_H
 
+#include 
+
+#include 
+
+struct kvm_mmu_page {
+   struct list_head link;
+   struct hlist_node hash_link;
+   struct list_head lpage_disallowed_link;
+
+   bool unsync;
+   u8 mmu_valid_gen;
+   bool mmio_cached;
+   bool lpage_disallowed; /* Can't be replaced by an equiv large page */
+
+   /*
+* The following two entries are used to key the shadow page in the
+* hash table.
+*/
+   union kvm_mmu_page_role role;
+   gfn_t gfn;
+
+   u64 *spt;
+   /* hold the gfn of each spte inside spt */
+   gfn_t *gfns;
+   int root_count;  /* Currently serving as active root */
+   unsigned int unsync_children;
+   struct kvm_rmap_head parent_ptes; /* rmap pointers to parent sptes */
+   DECLARE_BITMAP(unsync_child_bitmap, 512);
+
+#ifdef CONFIG_X86_32
+   /*
+* Used out of the mmu-lock to avoid reading spte values while an
+* update is in progress; see the comments in __get_spte_lockless().
+*/
+   int clear_spte_count;
+#endif
+
+   /* Number of writes since the last time traversal visited this page.  */
+   atomic_t write_flooding_count;
+};
+
+static inline struct kvm_mmu_page *page_header(hpa_t shadow_page)
+{
+   struct page *page = pfn_to_page(shadow_page >> PAGE_SHIFT);
+
+   return (struct kvm_mmu_page *)page_private(page);
+}
+
 void kvm_mmu_gfn_disallow_lpage(struct kvm_memory_slot *slot, gfn_t gfn);
 void kvm_mmu_gfn_allow_lpage(struct kvm_memory_slot *slot, gfn_t gfn);
 bool kvm_mmu_slot_gfn_write_protect(struct kvm *kvm,
-- 
2.26.0

[PATCH 6/6] KVM: x86/mmu: Rename page_header() to to_shadow_page()

2020-06-22 Thread Sean Christopherson

Rename KVM's accessor for retrieving a 'struct kvm_mmu_page' from the
associated host physical address to better convey what the function is
doing.

Signed-off-by: Sean Christopherson 
---
 arch/x86/kvm/mmu/mmu.c  | 20 ++--
 arch/x86/kvm/mmu/mmu_audit.c|  6 +++---
 arch/x86/kvm/mmu/mmu_internal.h |  4 ++--
 3 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index cd1f8017de8a..258334b4e563 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -2208,7 +2208,7 @@ static int __mmu_unsync_walk(struct kvm_mmu_page *sp,
continue;
}
 
-   child = page_header(ent & PT64_BASE_ADDR_MASK);
+   child = to_shadow_page(ent & PT64_BASE_ADDR_MASK);
 
if (child->unsync_children) {
if (mmu_pages_add(pvec, child, i))
@@ -2656,7 +2656,7 @@ static void validate_direct_spte(struct kvm_vcpu *vcpu, 
u64 *sptep,
 * so we should update the spte at this point to get
 * a new sp with the correct access.
 */
-   child = page_header(*sptep & PT64_BASE_ADDR_MASK);
+   child = to_shadow_page(*sptep & PT64_BASE_ADDR_MASK);
if (child->role.access == direct_access)
return;
 
@@ -2678,7 +2678,7 @@ static bool mmu_page_zap_pte(struct kvm *kvm, struct 
kvm_mmu_page *sp,
if (is_large_pte(pte))
--kvm->stat.lpages;
} else {
-   child = page_header(pte & PT64_BASE_ADDR_MASK);
+   child = to_shadow_page(pte & PT64_BASE_ADDR_MASK);
drop_parent_pte(child, spte);
}
return true;
@@ -3110,7 +3110,7 @@ static int mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
struct kvm_mmu_page *child;
u64 pte = *sptep;
 
-   child = page_header(pte & PT64_BASE_ADDR_MASK);
+   child = to_shadow_page(pte & PT64_BASE_ADDR_MASK);
drop_parent_pte(child, sptep);
flush = true;
} else if (pfn != spte_to_pfn(*sptep)) {
@@ -3615,7 +3615,7 @@ static void mmu_free_root_page(struct kvm *kvm, hpa_t 
*root_hpa,
if (!VALID_PAGE(*root_hpa))
return;
 
-   sp = page_header(*root_hpa & PT64_BASE_ADDR_MASK);
+   sp = to_shadow_page(*root_hpa & PT64_BASE_ADDR_MASK);
--sp->root_count;
if (!sp->root_count && sp->role.invalid)
kvm_mmu_prepare_zap_page(kvm, sp, invalid_list);
@@ -3845,7 +3845,7 @@ void kvm_mmu_sync_roots(struct kvm_vcpu *vcpu)
 
if (vcpu->arch.mmu->root_level >= PT64_ROOT_4LEVEL) {
hpa_t root = vcpu->arch.mmu->root_hpa;
-   sp = page_header(root);
+   sp = to_shadow_page(root);
 
/*
 * Even if another CPU was marking the SP as unsync-ed
@@ -3879,7 +3879,7 @@ void kvm_mmu_sync_roots(struct kvm_vcpu *vcpu)
 
if (root && VALID_PAGE(root)) {
root &= PT64_BASE_ADDR_MASK;
-   sp = page_header(root);
+   sp = to_shadow_page(root);
mmu_sync_children(vcpu, sp);
}
}
@@ -4235,8 +4235,8 @@ static inline bool is_root_usable(struct 
kvm_mmu_root_info *root, gpa_t pgd,
  union kvm_mmu_page_role role)
 {
return (role.direct || pgd == root->pgd) &&
-  VALID_PAGE(root->hpa) && page_header(root->hpa) &&
-  role.word == page_header(root->hpa)->role.word;
+  VALID_PAGE(root->hpa) && to_shadow_page(root->hpa) &&
+  role.word == to_shadow_page(root->hpa)->role.word;
 }
 
 /*
@@ -4321,7 +4321,7 @@ static void __kvm_mmu_new_pgd(struct kvm_vcpu *vcpu, 
gpa_t new_pgd,
 */
vcpu_clear_mmio_info(vcpu, MMIO_GVA_ANY);
 
-   __clear_sp_write_flooding_count(page_header(vcpu->arch.mmu->root_hpa));
+   
__clear_sp_write_flooding_count(to_shadow_page(vcpu->arch.mmu->root_hpa));
 }
 
 void kvm_mmu_new_pgd(struct kvm_vcpu *vcpu, gpa_t new_pgd, bool skip_tlb_flush,
diff --git a/arch/x86/kvm/mmu/mmu_audit.c b/arch/x86/kvm/mmu/mmu_audit.c
index 6ba703d3497f..c8d51a37e2ce 100644
--- a/arch/x86/kvm/mmu/mmu_audit.c
+++ b/arch/x86/kvm/mmu/mmu_audit.c
@@ -45,7 +45,7 @@ static void __mmu_spte_walk(struct kvm_vcpu *vcpu, struct 
kvm_mmu_page *sp,
  !is_last_spte(ent[i], level)) {
struct kvm_mmu_page *child;
 
-   child = page_header(ent[i] & PT64_BASE_ADDR_MASK);
+   child = to_shadow_page(ent[i] & PT64_BASE_ADDR_MASK);
__mmu_spte_walk(vcpu, child, fn, level - 1);
}
}
@@ -62,7 +62,7 @@

[PATCH 0/6] KVM: x86/mmu: Files and sp helper cleanups

2020-06-22 Thread Sean Christopherson

Move more files to the mmu/ directory, and an mmu_internal.h to share
stuff amongst the mmu/ files, and clean up the helpers for retrieving a
shadow page from a sptep and/or hpa.

Sean Christopherson (6):
  KVM: x86/mmu: Move mmu_audit.c and mmutrace.h into the mmu/
sub-directory
  KVM: x86/mmu: Move kvm_mmu_available_pages() into mmu.c
  KVM: x86/mmu: Add MMU-internal header
  KVM: x86/mmu: Make kvm_mmu_page definition and accessor internal-only
  KVM: x86/mmu: Add sptep_to_sp() helper to wrap shadow page lookup
  KVM: x86/mmu: Rename page_header() to to_shadow_page()

 arch/x86/include/asm/kvm_host.h| 46 +-
 arch/x86/kvm/mmu.h | 13 --
 arch/x86/kvm/mmu/mmu.c | 58 +++
 arch/x86/kvm/{ => mmu}/mmu_audit.c | 12 +++---
 arch/x86/kvm/mmu/mmu_internal.h| 63 ++
 arch/x86/kvm/{ => mmu}/mmutrace.h  |  2 +-
 arch/x86/kvm/mmu/page_track.c  |  2 +-
 arch/x86/kvm/mmu/paging_tmpl.h |  4 +-
 8 files changed, 108 insertions(+), 92 deletions(-)
 rename arch/x86/kvm/{ => mmu}/mmu_audit.c (96%)
 create mode 100644 arch/x86/kvm/mmu/mmu_internal.h
 rename arch/x86/kvm/{ => mmu}/mmutrace.h (99%)

-- 
2.26.0

[PATCH 2/6] KVM: x86/mmu: Move kvm_mmu_available_pages() into mmu.c

2020-06-22 Thread Sean Christopherson

Move kvm_mmu_available_pages() from mmu.h to mmu.c, it has a single
caller and has no business being exposed via mmu.h.

Signed-off-by: Sean Christopherson 
---
 arch/x86/kvm/mmu.h | 9 -
 arch/x86/kvm/mmu/mmu.c | 9 +
 2 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 0ad06bfe2c2c..d46944488e72 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -64,15 +64,6 @@ bool kvm_can_do_async_pf(struct kvm_vcpu *vcpu);
 int kvm_handle_page_fault(struct kvm_vcpu *vcpu, u64 error_code,
u64 fault_address, char *insn, int insn_len);
 
-static inline unsigned long kvm_mmu_available_pages(struct kvm *kvm)
-{
-   if (kvm->arch.n_max_mmu_pages > kvm->arch.n_used_mmu_pages)
-   return kvm->arch.n_max_mmu_pages -
-   kvm->arch.n_used_mmu_pages;
-
-   return 0;
-}
-
 static inline int kvm_mmu_reload(struct kvm_vcpu *vcpu)
 {
if (likely(vcpu->arch.mmu->root_hpa != INVALID_PAGE))
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index fdd05c233308..1b4d45b8f462 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -2825,6 +2825,15 @@ static bool prepare_zap_oldest_mmu_page(struct kvm *kvm,
return kvm_mmu_prepare_zap_page(kvm, sp, invalid_list);
 }
 
+static inline unsigned long kvm_mmu_available_pages(struct kvm *kvm)
+{
+   if (kvm->arch.n_max_mmu_pages > kvm->arch.n_used_mmu_pages)
+   return kvm->arch.n_max_mmu_pages -
+   kvm->arch.n_used_mmu_pages;
+
+   return 0;
+}
+
 static int make_mmu_pages_available(struct kvm_vcpu *vcpu)
 {
LIST_HEAD(invalid_list);
-- 
2.26.0

[PATCH 5/6] KVM: x86/mmu: Add sptep_to_sp() helper to wrap shadow page lookup

2020-06-22 Thread Sean Christopherson

Introduce sptep_to_sp() to reduce the boilerplate code needed to get the
shadow page associated with a spte pointer, and to improve readability
as it's not immediately obvious that "page_header" is a KVM-specific
accessor for retrieving a shadow page.

Signed-off-by: Sean Christopherson 
---
 arch/x86/kvm/mmu/mmu.c  | 28 +---
 arch/x86/kvm/mmu/mmu_audit.c|  6 +++---
 arch/x86/kvm/mmu/mmu_internal.h |  5 +
 arch/x86/kvm/mmu/paging_tmpl.h  |  4 ++--
 4 files changed, 23 insertions(+), 20 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index c1bf30e24bfc..cd1f8017de8a 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -677,7 +677,7 @@ union split_spte {
 
 static void count_spte_clear(u64 *sptep, u64 spte)
 {
-   struct kvm_mmu_page *sp =  page_header(__pa(sptep));
+   struct kvm_mmu_page *sp =  sptep_to_sp(sptep);
 
if (is_shadow_present_pte(spte))
return;
@@ -761,7 +761,7 @@ static u64 __update_clear_spte_slow(u64 *sptep, u64 spte)
  */
 static u64 __get_spte_lockless(u64 *sptep)
 {
-   struct kvm_mmu_page *sp =  page_header(__pa(sptep));
+   struct kvm_mmu_page *sp =  sptep_to_sp(sptep);
union split_spte spte, *orig = (union split_spte *)sptep;
int count;
 
@@ -1427,7 +1427,7 @@ static int rmap_add(struct kvm_vcpu *vcpu, u64 *spte, 
gfn_t gfn)
struct kvm_mmu_page *sp;
struct kvm_rmap_head *rmap_head;
 
-   sp = page_header(__pa(spte));
+   sp = sptep_to_sp(spte);
kvm_mmu_page_set_gfn(sp, spte - sp->spt, gfn);
rmap_head = gfn_to_rmap(vcpu->kvm, gfn, sp);
return pte_list_add(vcpu, spte, rmap_head);
@@ -1439,7 +1439,7 @@ static void rmap_remove(struct kvm *kvm, u64 *spte)
gfn_t gfn;
struct kvm_rmap_head *rmap_head;
 
-   sp = page_header(__pa(spte));
+   sp = sptep_to_sp(spte);
gfn = kvm_mmu_page_get_gfn(sp, spte - sp->spt);
rmap_head = gfn_to_rmap(kvm, gfn, sp);
__pte_list_remove(spte, rmap_head);
@@ -1531,7 +1531,7 @@ static void drop_spte(struct kvm *kvm, u64 *sptep)
 static bool __drop_large_spte(struct kvm *kvm, u64 *sptep)
 {
if (is_large_pte(*sptep)) {
-   WARN_ON(page_header(__pa(sptep))->role.level == PG_LEVEL_4K);
+   WARN_ON(sptep_to_sp(sptep)->role.level == PG_LEVEL_4K);
drop_spte(kvm, sptep);
--kvm->stat.lpages;
return true;
@@ -1543,7 +1543,7 @@ static bool __drop_large_spte(struct kvm *kvm, u64 *sptep)
 static void drop_large_spte(struct kvm_vcpu *vcpu, u64 *sptep)
 {
if (__drop_large_spte(vcpu->kvm, sptep)) {
-   struct kvm_mmu_page *sp = page_header(__pa(sptep));
+   struct kvm_mmu_page *sp = sptep_to_sp(sptep);
 
kvm_flush_remote_tlbs_with_address(vcpu->kvm, sp->gfn,
KVM_PAGES_PER_HPAGE(sp->role.level));
@@ -2017,7 +2017,7 @@ static void rmap_recycle(struct kvm_vcpu *vcpu, u64 
*spte, gfn_t gfn)
struct kvm_rmap_head *rmap_head;
struct kvm_mmu_page *sp;
 
-   sp = page_header(__pa(spte));
+   sp = sptep_to_sp(spte);
 
rmap_head = gfn_to_rmap(vcpu->kvm, gfn, sp);
 
@@ -2139,7 +2139,7 @@ static void mark_unsync(u64 *spte)
struct kvm_mmu_page *sp;
unsigned int index;
 
-   sp = page_header(__pa(spte));
+   sp = sptep_to_sp(spte);
index = spte - sp->spt;
if (__test_and_set_bit(index, sp->unsync_child_bitmap))
return;
@@ -2465,9 +2465,7 @@ static void __clear_sp_write_flooding_count(struct 
kvm_mmu_page *sp)
 
 static void clear_sp_write_flooding_count(u64 *spte)
 {
-   struct kvm_mmu_page *sp =  page_header(__pa(spte));
-
-   __clear_sp_write_flooding_count(sp);
+   __clear_sp_write_flooding_count(sptep_to_sp(spte));
 }
 
 static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu,
@@ -3009,7 +3007,7 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
if (set_mmio_spte(vcpu, sptep, gfn, pfn, pte_access))
return 0;
 
-   sp = page_header(__pa(sptep));
+   sp = sptep_to_sp(sptep);
if (sp_ad_disabled(sp))
spte |= SPTE_AD_DISABLED_MASK;
else if (kvm_vcpu_ad_need_write_protect(vcpu))
@@ -3222,7 +3220,7 @@ static void direct_pte_prefetch(struct kvm_vcpu *vcpu, 
u64 *sptep)
 {
struct kvm_mmu_page *sp;
 
-   sp = page_header(__pa(sptep));
+   sp = sptep_to_sp(sptep);
 
/*
 * Without accessed bits, there's no way to distinguish between
@@ -3530,7 +3528,7 @@ static bool fast_page_fault(struct kvm_vcpu *vcpu, gpa_t 
cr2_or_gpa,
if (!is_shadow_present_pte(spte))
break;
 
-   sp = page_header(__pa(iterator.sptep));
+   sp = sptep_to_sp(iterator.sptep);
if (!is_last_spte(spte, sp->role.level))
break;
 
@@

Re: [PATCH] binder: fix null deref of proc->context

2020-06-22 Thread Todd Kjos

On Mon, Jun 22, 2020 at 1:09 PM Christian Brauner
 wrote:
>
> On Mon, Jun 22, 2020 at 01:07:15PM -0700, Todd Kjos wrote:
> > The binder driver makes the assumption proc->context pointer is invariant 
> > after
> > initialization (as documented in the kerneldoc header for struct proc).
> > However, in commit f0fe2c0f050d ("binder: prevent UAF for binderfs devices 
> > II")
> > proc->context is set to NULL during binder_deferred_release().
> >
> > Another proc was in the middle of setting up a transaction to the dying
> > process and crashed on a NULL pointer deref on "context" which is a local
> > set to >context:
> >
> > new_ref->data.desc = (node == context->binder_context_mgr_node) ? 0 : 1;
> >
> > Here's the stack:
> >
> > [ 5237.855435] Call trace:
> > [ 5237.855441] binder_get_ref_for_node_olocked+0x100/0x2ec
> > [ 5237.855446] binder_inc_ref_for_node+0x140/0x280
> > [ 5237.855451] binder_translate_binder+0x1d0/0x388
> > [ 5237.855456] binder_transaction+0x2228/0x3730
> > [ 5237.855461] binder_thread_write+0x640/0x25bc
> > [ 5237.855466] binder_ioctl_write_read+0xb0/0x464
> > [ 5237.855471] binder_ioctl+0x30c/0x96c
> > [ 5237.855477] do_vfs_ioctl+0x3e0/0x700
> > [ 5237.855482] __arm64_sys_ioctl+0x78/0xa4
> > [ 5237.855488] el0_svc_common+0xb4/0x194
> > [ 5237.855493] el0_svc_handler+0x74/0x98
> > [ 5237.855497] el0_svc+0x8/0xc
> >
> > The fix is to move the kfree of the binder_device to binder_free_proc()
> > so the binder_device is freed when we know there are no references
> > remaining on the binder_proc.
> >
> > Fixes: f0fe2c0f050d ("binder: prevent UAF for binderfs devices II")
> > Signed-off-by: Todd Kjos 

Forgot to include stable. The issue was introduced in 5.6, so fix needed in 5.7.
Cc: sta...@vger.kernel.org # 5.7


>
>
> Thanks, looks good to me!
> Acked-by: Christian Brauner 
>
> Christian
>
> > ---
> >  drivers/android/binder.c | 14 +++---
> >  1 file changed, 7 insertions(+), 7 deletions(-)
> >
> > diff --git a/drivers/android/binder.c b/drivers/android/binder.c
> > index e47c8a4c83db..f50c5f182bb5 100644
> > --- a/drivers/android/binder.c
> > +++ b/drivers/android/binder.c
> > @@ -4686,8 +4686,15 @@ static struct binder_thread 
> > *binder_get_thread(struct binder_proc *proc)
> >
> >  static void binder_free_proc(struct binder_proc *proc)
> >  {
> > + struct binder_device *device;
> > +
> >   BUG_ON(!list_empty(>todo));
> >   BUG_ON(!list_empty(>delivered_death));
> > + device = container_of(proc->context, struct binder_device, context);
> > + if (refcount_dec_and_test(>ref)) {
> > + kfree(proc->context->name);
> > + kfree(device);
> > + }
> >   binder_alloc_deferred_release(>alloc);
> >   put_task_struct(proc->tsk);
> >   binder_stats_deleted(BINDER_STAT_PROC);
> > @@ -5406,7 +5413,6 @@ static int binder_node_release(struct binder_node 
> > *node, int refs)
> >  static void binder_deferred_release(struct binder_proc *proc)
> >  {
> >   struct binder_context *context = proc->context;
> > - struct binder_device *device;
> >   struct rb_node *n;
> >   int threads, nodes, incoming_refs, outgoing_refs, active_transactions;
> >
> > @@ -5423,12 +5429,6 @@ static void binder_deferred_release(struct 
> > binder_proc *proc)
> >   context->binder_context_mgr_node = NULL;
> >   }
> >   mutex_unlock(>context_mgr_node_lock);
> > - device = container_of(proc->context, struct binder_device, context);
> > - if (refcount_dec_and_test(>ref)) {
> > - kfree(context->name);
> > - kfree(device);
> > - }
> > - proc->context = NULL;
> >   binder_inner_proc_lock(proc);
> >   /*
> >* Make sure proc stays alive after we
> > --
> > 2.27.0.111.gc72c7da667-goog
> >

[PATCH] trivial: fix kerneldoc comments

2020-06-22 Thread Julia Lawall

Fix the parameter names in the comment to correspond to those in the
function header.

Drop comments about return values when there is no return value.

Signed-off-by: Julia Lawall 
---
 arch/arm/mach-omap2/omap-secure.c|  2 +-
 arch/arm/mach-prima2/rtciobrg.c  |  2 +-
 arch/mips/cavium-octeon/executive/cvmx-spi.c |  4 +---
 arch/mips/kvm/tlb.c  |  2 +-
 arch/parisc/kernel/drivers.c |  2 +-
 arch/powerpc/kernel/eeh_pe.c |  4 ++--
 arch/powerpc/kernel/uprobes.c|  2 +-
 crypto/asymmetric_keys/verify_pefile.c   |  2 +-
 drivers/ata/libata-transport.c   |  4 ++--
 drivers/base/power/wakeup.c  |  2 +-
 drivers/bus/fsl-mc/mc-io.c   |  2 +-
 drivers/crypto/bcm/spu.c |  4 ++--
 drivers/crypto/qat/qat_common/adf_dev_mgr.c  |  2 +-
 drivers/gpu/drm/omapdrm/omap_gem.c   |  2 +-
 drivers/gpu/drm/radeon/r100.c|  2 +-
 drivers/gpu/drm/radeon/radeon_kms.c  |  1 -
 drivers/hid/hid-core.c   |  4 ++--
 drivers/iio/dummy/iio_simple_dummy_buffer.c  |  2 +-
 drivers/infiniband/core/roce_gid_mgmt.c  |  2 +-
 drivers/infiniband/hw/i40iw/i40iw_hmc.c  |  4 ++--
 drivers/infiniband/hw/qib/qib_driver.c   |  2 +-
 drivers/infiniband/hw/qib/qib_iba7220.c  |  2 +-
 drivers/infiniband/sw/rdmavt/srq.c   |  2 +-
 drivers/leds/trigger/ledtrig-cpu.c   |  2 +-
 drivers/mfd/atmel-smc.c  |  6 +++---
 drivers/misc/enclosure.c |  4 ++--
 drivers/net/bonding/bond_3ad.c   |  2 +-
 drivers/net/bonding/bond_main.c  |  2 +-
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c  |  4 ++--
 drivers/net/ethernet/freescale/dpaa2/dpni.c  |  4 ++--
 drivers/net/ethernet/freescale/fman/fman.c   |  4 ++--
 drivers/net/net_failover.c   |  2 +-
 drivers/net/wireless/marvell/libertas/firmware.c |  4 ++--
 drivers/net/wireless/mediatek/mt7601u/phy.c  |  2 +-
 drivers/net/wireless/ti/wlcore/cmd.c |  2 +-
 drivers/of/address.c |  2 +-
 drivers/pci/hotplug/pci_hotplug_core.c   |  2 +-
 drivers/pinctrl/core.c   |  6 +++---
 drivers/power/supply/power_supply_core.c |  2 +-
 drivers/scsi/bnx2i/bnx2i_hwi.c   | 16 
 drivers/scsi/isci/port.c | 16 
 drivers/scsi/libfc/fc_fcp.c  |  2 +-
 drivers/scsi/mpt3sas/mpt3sas_base.c  |  2 +-
 drivers/soc/mediatek/mtk-infracfg.c  |  4 ++--
 drivers/soundwire/stream.c   |  2 +-
 drivers/thunderbolt/ctl.c|  2 ++
 drivers/tty/tty_buffer.c |  4 ++--
 drivers/tty/tty_ldisc.c  |  2 +-
 drivers/usb/cdns3/gadget.c   |  4 ++--
 fs/ecryptfs/crypto.c |  2 +-
 fs/gfs2/log.c|  2 +-
 fs/nilfs2/btree.c|  2 +-
 fs/nilfs2/cpfile.c   |  4 ++--
 fs/nilfs2/sufile.c   |  2 +-
 kernel/bpf/cgroup.c  |  2 +-
 kernel/stacktrace.c  |  2 +-
 lib/lru_cache.c  |  2 +-
 mm/sparse-vmemmap.c  |  2 +-
 net/netfilter/nf_tables_api.c|  8 
 net/nfc/nci/core.c   |  4 ++--
 net/tipc/msg.c   |  2 +-
 sound/ac97/bus.c |  4 ++--
 sound/soc/uniphier/aio-core.c|  3 +--
 63 files changed, 100 insertions(+), 102 deletions(-)

diff --git a/arch/arm/mach-omap2/omap-secure.c 
b/arch/arm/mach-omap2/omap-secure.c
index f70d561f37f7..eebc74384d5d 100644
--- a/arch/arm/mach-omap2/omap-secure.c
+++ b/arch/arm/mach-omap2/omap-secure.c
@@ -179,7 +179,7 @@ u32 rx51_secure_dispatcher(u32 idx, u32 process, u32 flag, 
u32 nargs,
 /**
  * rx51_secure_update_aux_cr: Routine to modify the contents of Auxiliary 
Control Register
  *  @set_bits: bits to set in ACR
- *  @clr_bits: bits to clear in ACR
+ *  @clear_bits: bits to clear in ACR
  *
  * Return the non-zero error value on failure.
 */
diff --git a/arch/arm/mach-prima2/rtciobrg.c b/arch/arm/mach-prima2/rtciobrg.c
index 97c0e333e3b9..cf345d3a08cd 100644
--- a/arch/arm/mach-prima2/rtciobrg.c
+++ b/arch/arm/mach-prima2/rtciobrg.c
@@ -127,7 +127,7 @@ static struct regmap_bus regmap_iobg = {
 /**
  * devm_regmap_init_iobg(): Initialise managed register map
  *
- * @iobg: Device that will be interacted with
+ * @dev: Device that will be interacted with
  * @config: Configuration for register map
  *

[PATCH] btrfs: tests: remove if duplicate in __check_free_space_extents()

2020-06-22 Thread Denis Efremov

num_extents is already checked in the next if condition and can
be safely removed.

Signed-off-by: Denis Efremov 
---
 fs/btrfs/tests/free-space-tree-tests.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/fs/btrfs/tests/free-space-tree-tests.c 
b/fs/btrfs/tests/free-space-tree-tests.c
index 914eea5ba6a7..2c783d2f5228 100644
--- a/fs/btrfs/tests/free-space-tree-tests.c
+++ b/fs/btrfs/tests/free-space-tree-tests.c
@@ -60,8 +60,6 @@ static int __check_free_space_extents(struct 
btrfs_trans_handle *trans,
if (prev_bit == 0 && bit == 1) {
extent_start = offset;
} else if (prev_bit == 1 && bit == 0) {
-   if (i >= num_extents)
-   goto invalid;
if (i >= num_extents ||
extent_start != extents[i].start ||
offset - extent_start != 
extents[i].length)
-- 
2.26.2

Re: [PATCH v2] ima: move APPRAISE_BOOTPARAM dependency on ARCH_POLICY to runtime

2020-06-22 Thread Bruno Meneguele

On Mon, Jun 22, 2020 at 03:28:13PM -0400, Mimi Zohar wrote:
> On Mon, 2020-06-22 at 14:27 -0300, Bruno Meneguele wrote:
> > IMA_APPRAISE_BOOTPARAM has been marked as dependent on !IMA_ARCH_POLICY in
> > compile time, enforcing the appraisal whenever the kernel had the arch
> > policy option enabled.
> > 
> > However it breaks systems where the option is actually set but the system
> > wasn't booted in a "secure boot" platform. In this scenario, anytime the
> > an appraisal policy (i.e. ima_policy=appraisal_tcb) is used it will be
> > forced, giving no chance to the user set the 'fix' state (ima_appraise=fix)
> > to actually measure system's files.
> > 
> > This patch remove this compile time dependency and move it to a runtime
> > decision, based on the arch policy loading failure/success.
> > 
> > Cc: sta...@vger.kernel.org
> > Fixes: d958083a8f64 ("x86/ima: define arch_get_ima_policy() for x86")
> > Signed-off-by: Bruno Meneguele 
> > ---
> > changes from v1:
> > - removed "ima:" prefix from pr_info() message
> > 
> >  security/integrity/ima/Kconfig  | 2 +-
> >  security/integrity/ima/ima_policy.c | 8 ++--
> >  2 files changed, 7 insertions(+), 3 deletions(-)
> > 
> > diff --git a/security/integrity/ima/Kconfig b/security/integrity/ima/Kconfig
> > index edde88dbe576..62dc11a5af01 100644
> > --- a/security/integrity/ima/Kconfig
> > +++ b/security/integrity/ima/Kconfig
> > @@ -232,7 +232,7 @@ config IMA_APPRAISE_REQUIRE_POLICY_SIGS
> >  
> >  config IMA_APPRAISE_BOOTPARAM
> > bool "ima_appraise boot parameter"
> > -   depends on IMA_APPRAISE && !IMA_ARCH_POLICY
> > +   depends on IMA_APPRAISE
> > default y
> > help
> >   This option enables the different "ima_appraise=" modes
> > diff --git a/security/integrity/ima/ima_policy.c 
> > b/security/integrity/ima/ima_policy.c
> > index e493063a3c34..c876617d4210 100644
> > --- a/security/integrity/ima/ima_policy.c
> > +++ b/security/integrity/ima/ima_policy.c
> > @@ -733,11 +733,15 @@ void __init ima_init_policy(void)
> >  * (Highest priority)
> >  */
> > arch_entries = ima_init_arch_policy();
> > -   if (!arch_entries)
> > +   if (!arch_entries) {
> > pr_info("No architecture policies found\n");
> > -   else
> > +   } else {
> > +   /* Force appraisal, preventing runtime xattr changes */
> > +   pr_info("setting IMA appraisal to enforced\n");
> > +   ima_appraise = IMA_APPRAISE_ENFORCE;
> > add_rules(arch_policy_entry, arch_entries,
> >   IMA_DEFAULT_POLICY | IMA_CUSTOM_POLICY);
> > +   }
> >  
> > /*
> >  * Insert the builtin "secure_boot" policy rules requiring file
> 
> CONFIG_IMA_APPRAISE_BOOTPARAM controls the "ima_appraise" mode bits.  
> The mode bits are or'ed with the MODULES, FIRMWARE, POLICY, and KEXEC
> bits, which have already been set in ima_init_arch_policy().
> 

Sorry for missing this part! Of course I should've spoted that just my
following ima_appraise down the code.

> From ima.h:
> /* Appraise integrity measurements */
> #define IMA_APPRAISE_ENFORCE0x01
> #define IMA_APPRAISE_FIX0x02
> #define IMA_APPRAISE_LOG0x04
> #define IMA_APPRAISE_MODULES0x08
> #define IMA_APPRAISE_FIRMWARE   0x10
> #define IMA_APPRAISE_POLICY 0x20
> #define IMA_APPRAISE_KEXEC  0x40
> 
> As Nayna pointed out, only when an architecture specific "secure boot"
> policy is loaded, is this applicable. 

Yes, will come up with patch covering only this case.

Thanks Mimi!

-- 
bmeneg 
PGP Key: http://bmeneg.com/pubkey.txt


signature.asc
Description: PGP signature

Re: [Linaro-mm-sig] [PATCH 04/18] dma-fence: prime lockdep annotations

2020-06-22 Thread Jerome Glisse

On Mon, Jun 22, 2020 at 08:46:17AM -0300, Jason Gunthorpe wrote:
> On Fri, Jun 19, 2020 at 04:31:47PM -0400, Jerome Glisse wrote:
> > Not doable as page refcount can change for things unrelated to GUP, with
> > John changes we can identify GUP and we could potentialy copy GUPed page
> > instead of COW but this can potentialy slow down fork() and i am not sure
> > how acceptable this would be. Also this does not solve GUP against page
> > that are already in fork tree ie page P0 is in process A which forks,
> > we now have page P0 in process A and B. Now we have process A which forks
> > again and we have page P0 in A, B, and C. Here B and C are two branches
> > with root in A. B and/or C can keep forking and grow the fork tree.
> 
> For a long time now RDMA has broken COW pages when creating user DMA
> regions.
> 
> The problem has been that fork re-COW's regions that had their COW
> broken.
> 
> So, if you break the COW upon mapping and prevent fork (and others)
> from copying DMA pinned then you'd cover the cases.

I am not sure we want to prevent COW for pinned GUP pages, this would
change current semantic and potentialy break/slow down existing apps.

Anyway i think we focus too much on fork/COW, it is just an unfixable
broken corner cases, mmu notifier allows you to avoid it. Forcing real
copy on fork would likely be seen as regression by most people.


> > Semantic was change with 17839856fd588f4ab6b789f482ed3ffd7c403e1f to some
> > what "fix" that but GUP fast is still succeptible to this.
> 
> Ah, so everyone breaks the COW now, not just RDMA..
> 
> What do you mean 'GUP fast is still succeptible to this' ?

Not all GUP fast path are updated (intentionaly) __get_user_pages_fast()
for instance still keeps COW intact. People using GUP should really knows
what they are doing.

Cheers,
Jérôme

Re: [PATCH] Replace HTTP links with HTTPS ones: Documentation/process

2020-06-22 Thread Alexander A. Klimov





Am 22.06.20 um 22:06 schrieb Miguel Ojeda:

On Mon, Jun 22, 2020 at 7:29 PM Joe Perches  wrote:


scripts/get_maintainer.pl --self-test=links has a reachability test
using wget.

Perhaps a script like that could be used for http:// vs https://


+1

Not sure about `--no-check-certificate` if the goal is to move to
"proper HTTPS". Perhaps we can try first without it and if that fails,
print a warning and try with `--no-check-certificate` etc.
To be honest my script even blocked HTTPS->HTTP redirections, so I opt 
for maximum security.




Cheers,
Miguel

Also I opt for freezing the discussion about eventual future runs of the 
script until everything from the first run[1] has been applied.



[1]
➜  linux git:(master) git stash show --shortstat
 1857 files changed, 2664 insertions(+), 2664 deletions(-)
➜  linux git:(master)

Re: [PATCH] ima_evm_utils: extended calc_bootaggr to PCRs 8 - 9

2020-06-22 Thread Mimi Zohar

On Thu, 2020-06-18 at 16:11 -0400, Maurizio Drocco wrote:
> From: Maurizio 
> 
> If PCRs 8 - 9 are set (i.e. not all-zeros), cal_bootaggr should include
> them into the digest.
> 
> Signed-off-by: Maurizio Drocco 
> ---
>  src/evmctl.c | 16 +++-
>  1 file changed, 15 insertions(+), 1 deletion(-)
> 
> diff --git a/src/evmctl.c b/src/evmctl.c
> index 1d065ce..554571e 100644
> --- a/src/evmctl.c
> +++ b/src/evmctl.c
> @@ -1930,6 +1930,18 @@ static void calc_bootaggr(struct tpm_bank_info *bank)
>   }
>   }
>  
> + if (strcmp(bank->algo_name, "sha1") != 0) {
> + for (i = 8; i < 10; i++) {
> + if (memcmp(bank->pcr[i], zero, bank->digest_size) != 0) 
> {
> + err = EVP_DigestUpdate(pctx, bank->pcr[i], 
> bank->digest_size);
> + if (!err) {
> + log_err("EVP_DigestUpdate() failed\n");
> + return;
> + }
> + }
> + }
> + }

Roberto, now that we're only including the PCRs 8 & 9 in the non-sha1
"boot_aggregate", they can always be included.

Please reflect this change in the patch description and, here, in the
code.

thanks,

Mimi

[PATCH v2 03/21] KVM: x86/mmu: Use consistent "mc" name for kvm_mmu_memory_cache locals

2020-06-22 Thread Sean Christopherson

Use "mc" for local variables to shorten line lengths and provide
consistent names, which will be especially helpful when some of the
helpers are moved to common KVM code in future patches.

No functional change intended.

Reviewed-by: Ben Gardon 
Signed-off-by: Sean Christopherson 
---
 arch/x86/kvm/mmu/mmu.c | 24 
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index cbc101663a89..36c90f004ef4 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -1060,27 +1060,27 @@ static void walk_shadow_page_lockless_end(struct 
kvm_vcpu *vcpu)
local_irq_enable();
 }
 
-static int mmu_topup_memory_cache(struct kvm_mmu_memory_cache *cache, int min)
+static int mmu_topup_memory_cache(struct kvm_mmu_memory_cache *mc, int min)
 {
void *obj;
 
-   if (cache->nobjs >= min)
+   if (mc->nobjs >= min)
return 0;
-   while (cache->nobjs < ARRAY_SIZE(cache->objects)) {
-   if (cache->kmem_cache)
-   obj = kmem_cache_zalloc(cache->kmem_cache, 
GFP_KERNEL_ACCOUNT);
+   while (mc->nobjs < ARRAY_SIZE(mc->objects)) {
+   if (mc->kmem_cache)
+   obj = kmem_cache_zalloc(mc->kmem_cache, 
GFP_KERNEL_ACCOUNT);
else
obj = (void *)__get_free_page(GFP_KERNEL_ACCOUNT);
if (!obj)
-   return cache->nobjs >= min ? 0 : -ENOMEM;
-   cache->objects[cache->nobjs++] = obj;
+   return mc->nobjs >= min ? 0 : -ENOMEM;
+   mc->objects[mc->nobjs++] = obj;
}
return 0;
 }
 
-static int mmu_memory_cache_free_objects(struct kvm_mmu_memory_cache *cache)
+static int mmu_memory_cache_free_objects(struct kvm_mmu_memory_cache *mc)
 {
-   return cache->nobjs;
+   return mc->nobjs;
 }
 
 static void mmu_free_memory_cache(struct kvm_mmu_memory_cache *mc)
@@ -1395,10 +1395,10 @@ static struct kvm_rmap_head *gfn_to_rmap(struct kvm 
*kvm, gfn_t gfn,
 
 static bool rmap_can_add(struct kvm_vcpu *vcpu)
 {
-   struct kvm_mmu_memory_cache *cache;
+   struct kvm_mmu_memory_cache *mc;
 
-   cache = >arch.mmu_pte_list_desc_cache;
-   return mmu_memory_cache_free_objects(cache);
+   mc = >arch.mmu_pte_list_desc_cache;
+   return mmu_memory_cache_free_objects(mc);
 }
 
 static int rmap_add(struct kvm_vcpu *vcpu, u64 *spte, gfn_t gfn)
-- 
2.26.0

[PATCH v2 02/21] KVM: x86/mmu: Consolidate "page" variant of memory cache helpers

2020-06-22 Thread Sean Christopherson

Drop the "page" variants of the topup/free memory cache helpers, using
the existence of an associated kmem_cache to select the correct alloc
or free routine.

No functional change intended.

Reviewed-by: Ben Gardon 
Signed-off-by: Sean Christopherson 
---
 arch/x86/kvm/mmu/mmu.c | 37 +++--
 1 file changed, 11 insertions(+), 26 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 0830c195c9ed..cbc101663a89 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -1067,7 +1067,10 @@ static int mmu_topup_memory_cache(struct 
kvm_mmu_memory_cache *cache, int min)
if (cache->nobjs >= min)
return 0;
while (cache->nobjs < ARRAY_SIZE(cache->objects)) {
-   obj = kmem_cache_zalloc(cache->kmem_cache, GFP_KERNEL_ACCOUNT);
+   if (cache->kmem_cache)
+   obj = kmem_cache_zalloc(cache->kmem_cache, 
GFP_KERNEL_ACCOUNT);
+   else
+   obj = (void *)__get_free_page(GFP_KERNEL_ACCOUNT);
if (!obj)
return cache->nobjs >= min ? 0 : -ENOMEM;
cache->objects[cache->nobjs++] = obj;
@@ -1082,30 +1085,12 @@ static int mmu_memory_cache_free_objects(struct 
kvm_mmu_memory_cache *cache)
 
 static void mmu_free_memory_cache(struct kvm_mmu_memory_cache *mc)
 {
-   while (mc->nobjs)
-   kmem_cache_free(mc->kmem_cache, mc->objects[--mc->nobjs]);
-}
-
-static int mmu_topup_memory_cache_page(struct kvm_mmu_memory_cache *cache,
-  int min)
-{
-   void *page;
-
-   if (cache->nobjs >= min)
-   return 0;
-   while (cache->nobjs < ARRAY_SIZE(cache->objects)) {
-   page = (void *)__get_free_page(GFP_KERNEL_ACCOUNT);
-   if (!page)
-   return cache->nobjs >= min ? 0 : -ENOMEM;
-   cache->objects[cache->nobjs++] = page;
+   while (mc->nobjs) {
+   if (mc->kmem_cache)
+   kmem_cache_free(mc->kmem_cache, 
mc->objects[--mc->nobjs]);
+   else
+   free_page((unsigned long)mc->objects[--mc->nobjs]);
}
-   return 0;
-}
-
-static void mmu_free_memory_cache_page(struct kvm_mmu_memory_cache *mc)
-{
-   while (mc->nobjs)
-   free_page((unsigned long)mc->objects[--mc->nobjs]);
 }
 
 static int mmu_topup_memory_caches(struct kvm_vcpu *vcpu)
@@ -1116,7 +1101,7 @@ static int mmu_topup_memory_caches(struct kvm_vcpu *vcpu)
   8 + PTE_PREFETCH_NUM);
if (r)
goto out;
-   r = mmu_topup_memory_cache_page(>arch.mmu_page_cache, 8);
+   r = mmu_topup_memory_cache(>arch.mmu_page_cache, 8);
if (r)
goto out;
r = mmu_topup_memory_cache(>arch.mmu_page_header_cache, 4);
@@ -1127,7 +1112,7 @@ static int mmu_topup_memory_caches(struct kvm_vcpu *vcpu)
 static void mmu_free_memory_caches(struct kvm_vcpu *vcpu)
 {
mmu_free_memory_cache(>arch.mmu_pte_list_desc_cache);
-   mmu_free_memory_cache_page(>arch.mmu_page_cache);
+   mmu_free_memory_cache(>arch.mmu_page_cache);
mmu_free_memory_cache(>arch.mmu_page_header_cache);
 }
 
-- 
2.26.0

[PATCH v2 00/21] KVM: Cleanup and unify kvm_mmu_memory_cache usage

2020-06-22 Thread Sean Christopherson

Note, patch 18 will conflict with the p4d rework in 5.8.  I originally
stated I would send v2 only after that got pulled into Paolo's tree, but
I got my timing wrong, i.e. I was thinking that would have already
happened.  I'll send v3 if necessary.  I wanted to get v2 out there now
that I actually compile tested other architectures.

Marc, I interpreted your "nothing caught fire" as Tested-by for the arm64
patches, let me know if that's not what you intended.


This series resurrects Christoffer Dall's series[1] to provide a common
MMU memory cache implementation that can be shared by x86, arm64 and MIPS.

It also picks up a suggested change from Ben Gardon[2] to clear shadow
page tables during initial allocation so as to avoid clearing entire
pages while holding mmu_lock.

The front half of the patches do house cleaning on x86's memory cache
implementation in preparation for moving it to common code, along with a
fair bit of cleanup on the usage.  The middle chunk moves the patches to
common KVM, and the last two chunks convert arm64 and MIPS to the common
implementation.

Fully tested on x86 only.  Compile tested patches 14-21 on arm64, MIPS,
s390 and PowerPC.

v2:
  - Rebase to kvm-5.8-2, commit 49b3deaad345 ("Merge tag ...").
  - Use an asm-generic kvm_types.h for s390 and PowerPC instead of an
empty arch-specific file. [Marc]
  - Explicit document "GFP_PGTABLE_USER == GFP_KERNEL_ACCOUNT | GFP_ZERO"
in the arm64 conversion patch. [Marc]
  - Collect review tags. [Ben]

[1] https://lkml.kernel.org/r/20191105110357.8607-1-christoffer.dall@arm
[2] https://lkml.kernel.org/r/20190926231824.149014-4-bgar...@google.com

Sean Christopherson (21):
  KVM: x86/mmu: Track the associated kmem_cache in the MMU caches
  KVM: x86/mmu: Consolidate "page" variant of memory cache helpers
  KVM: x86/mmu: Use consistent "mc" name for kvm_mmu_memory_cache locals
  KVM: x86/mmu: Remove superfluous gotos from mmu_topup_memory_caches()
  KVM: x86/mmu: Try to avoid crashing KVM if a MMU memory cache is empty
  KVM: x86/mmu: Move fast_page_fault() call above
mmu_topup_memory_caches()
  KVM: x86/mmu: Topup memory caches after walking GVA->GPA
  KVM: x86/mmu: Clean up the gorilla math in mmu_topup_memory_caches()
  KVM: x86/mmu: Separate the memory caches for shadow pages and gfn
arrays
  KVM: x86/mmu: Make __GFP_ZERO a property of the memory cache
  KVM: x86/mmu: Zero allocate shadow pages (outside of mmu_lock)
  KVM: x86/mmu: Skip filling the gfn cache for guaranteed direct MMU
topups
  KVM: x86/mmu: Prepend "kvm_" to memory cache helpers that will be
global
  KVM: Move x86's version of struct kvm_mmu_memory_cache to common code
  KVM: Move x86's MMU memory cache helpers to common KVM code
  KVM: arm64: Drop @max param from mmu_topup_memory_cache()
  KVM: arm64: Use common code's approach for __GFP_ZERO with memory
caches
  KVM: arm64: Use common KVM implementation of MMU memory caches
  KVM: MIPS: Drop @max param from mmu_topup_memory_cache()
  KVM: MIPS: Account pages used for GPA page tables
  KVM: MIPS: Use common KVM implementation of MMU memory caches

 arch/arm64/include/asm/kvm_host.h  |  11 ---
 arch/arm64/include/asm/kvm_types.h |   8 ++
 arch/arm64/kvm/arm.c   |   2 +
 arch/arm64/kvm/mmu.c   |  54 +++-
 arch/mips/include/asm/kvm_host.h   |  11 ---
 arch/mips/include/asm/kvm_types.h  |   7 ++
 arch/mips/kvm/mmu.c|  44 ++
 arch/powerpc/include/asm/Kbuild|   1 +
 arch/s390/include/asm/Kbuild   |   1 +
 arch/x86/include/asm/kvm_host.h|  14 +---
 arch/x86/include/asm/kvm_types.h   |   7 ++
 arch/x86/kvm/mmu/mmu.c | 129 +
 arch/x86/kvm/mmu/paging_tmpl.h |  10 +--
 include/asm-generic/kvm_types.h|   5 ++
 include/linux/kvm_host.h   |   7 ++
 include/linux/kvm_types.h  |  19 +
 virt/kvm/kvm_main.c|  55 
 17 files changed, 175 insertions(+), 210 deletions(-)
 create mode 100644 arch/arm64/include/asm/kvm_types.h
 create mode 100644 arch/mips/include/asm/kvm_types.h
 create mode 100644 arch/x86/include/asm/kvm_types.h
 create mode 100644 include/asm-generic/kvm_types.h

-- 
2.26.0

[PATCH v2 06/21] KVM: x86/mmu: Move fast_page_fault() call above mmu_topup_memory_caches()

2020-06-22 Thread Sean Christopherson

Avoid refilling the memory caches and potentially slow reclaim/swap when
handling a fast page fault, which does not need to allocate any new
objects.

Reviewed-by: Ben Gardon 
Signed-off-by: Sean Christopherson 
---
 arch/x86/kvm/mmu/mmu.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 5e773564ab20..4b4c3234d623 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4095,6 +4095,9 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, gpa_t 
gpa, u32 error_code,
if (page_fault_handle_page_track(vcpu, error_code, gfn))
return RET_PF_EMULATE;
 
+   if (fast_page_fault(vcpu, gpa, error_code))
+   return RET_PF_RETRY;
+
r = mmu_topup_memory_caches(vcpu);
if (r)
return r;
@@ -4102,9 +4105,6 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, gpa_t 
gpa, u32 error_code,
if (lpage_disallowed)
max_level = PG_LEVEL_4K;
 
-   if (fast_page_fault(vcpu, gpa, error_code))
-   return RET_PF_RETRY;
-
mmu_seq = vcpu->kvm->mmu_notifier_seq;
smp_rmb();
 
-- 
2.26.0

[PATCH v2 07/21] KVM: x86/mmu: Topup memory caches after walking GVA->GPA

2020-06-22 Thread Sean Christopherson

Topup memory caches after walking the GVA->GPA translation during a
shadow page fault, there is no need to ensure the caches are full when
walking the GVA.  As of commit f5a1e9f89504f ("KVM: MMU: remove call
to kvm_mmu_pte_write from walk_addr"), the FNAME(walk_addr) flow no
longer add rmaps via kvm_mmu_pte_write().

This avoids allocating memory in the case that the GVA is unmapped in
the guest, and also provides a paper trail of why/when the memory caches
need to be filled.

Reviewed-by: Ben Gardon 
Signed-off-by: Sean Christopherson 
---
 arch/x86/kvm/mmu/paging_tmpl.h | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index 38c576495048..3de32122f601 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -791,10 +791,6 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gpa_t 
addr, u32 error_code,
 
pgprintk("%s: addr %lx err %x\n", __func__, addr, error_code);
 
-   r = mmu_topup_memory_caches(vcpu);
-   if (r)
-   return r;
-
/*
 * If PFEC.RSVD is set, this is a shadow page fault.
 * The bit needs to be cleared before walking guest page tables.
@@ -822,6 +818,10 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gpa_t 
addr, u32 error_code,
return RET_PF_EMULATE;
}
 
+   r = mmu_topup_memory_caches(vcpu);
+   if (r)
+   return r;
+
vcpu->arch.write_fault_to_shadow_pgtable = false;
 
is_self_change_mapping = FNAME(is_self_change_mapping)(vcpu,
-- 
2.26.0

[PATCH v2 08/21] KVM: x86/mmu: Clean up the gorilla math in mmu_topup_memory_caches()

2020-06-22 Thread Sean Christopherson

Clean up the minimums in mmu_topup_memory_caches() to document the
driving mechanisms behind the minimums.  Now that encountering an empty
cache is unlikely to trigger BUG_ON(), it is less dangerous to be more
precise when defining the minimums.

For rmaps, the logic is 1 parent PTE per level, plus a single rmap, and
prefetched rmaps.  The extra objects in the current '8 + PREFETCH'
minimum came about due to an abundance of paranoia in commit
c41ef344de212 ("KVM: MMU: increase per-vcpu rmap cache alloc size"),
i.e. it could have increased the minimum to 2 rmaps.  Furthermore, the
unexpected extra rmap case was killed off entirely by commits
f759e2b4c728c ("KVM: MMU: avoid pte_list_desc running out in
kvm_mmu_pte_write") and f5a1e9f89504f ("KVM: MMU: remove call to
kvm_mmu_pte_write from walk_addr").

For the so called page cache, replace '8' with 2*PT64_ROOT_MAX_LEVEL.
The 2x multiplier is needed because the cache is used for both shadow
pages and gfn arrays for indirect MMUs.

And finally, for page headers, replace '4' with PT64_ROOT_MAX_LEVEL.

Note, KVM now supports 5-level paging, i.e. the old minimums that used a
baseline derived from 4-level paging were technically wrong.  But, KVM
always allocates roots in a separate flow, e.g. it's impossible in the
current implementation to actually need 5 new shadow pages in a single
flow.  Use PT64_ROOT_MAX_LEVEL unmodified instead of subtracting 1, as
the direct usage is likely more intuitive to uninformed readers, and the
inflated minimum is unlikely to affect functionality in practice.

Reviewed-by: Ben Gardon 
Signed-off-by: Sean Christopherson 
---
 arch/x86/kvm/mmu/mmu.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 4b4c3234d623..451e0365e5dd 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -1103,14 +1103,17 @@ static int mmu_topup_memory_caches(struct kvm_vcpu 
*vcpu)
 {
int r;
 
+   /* 1 rmap, 1 parent PTE per level, and the prefetched rmaps. */
r = mmu_topup_memory_cache(>arch.mmu_pte_list_desc_cache,
-  8 + PTE_PREFETCH_NUM);
+  1 + PT64_ROOT_MAX_LEVEL + PTE_PREFETCH_NUM);
if (r)
return r;
-   r = mmu_topup_memory_cache(>arch.mmu_page_cache, 8);
+   r = mmu_topup_memory_cache(>arch.mmu_page_cache,
+  2 * PT64_ROOT_MAX_LEVEL);
if (r)
return r;
-   return mmu_topup_memory_cache(>arch.mmu_page_header_cache, 4);
+   return mmu_topup_memory_cache(>arch.mmu_page_header_cache,
+ PT64_ROOT_MAX_LEVEL);
 }
 
 static void mmu_free_memory_caches(struct kvm_vcpu *vcpu)
-- 
2.26.0

[PATCH v2 05/21] KVM: x86/mmu: Try to avoid crashing KVM if a MMU memory cache is empty

2020-06-22 Thread Sean Christopherson

Attempt to allocate a new object instead of crashing KVM (and likely the
kernel) if a memory cache is unexpectedly empty.  Use GFP_ATOMIC for the
allocation as the caches are used while holding mmu_lock.  The immediate
BUG_ON() makes the code unnecessarily explosive and led to confusing
minimums being used in the past, e.g. allocating 4 objects where 1 would
suffice.

Signed-off-by: Sean Christopherson 
---
 arch/x86/kvm/mmu/mmu.c | 21 +++--
 1 file changed, 15 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index ba70de24a5b0..5e773564ab20 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -1060,6 +1060,15 @@ static void walk_shadow_page_lockless_end(struct 
kvm_vcpu *vcpu)
local_irq_enable();
 }
 
+static inline void *mmu_memory_cache_alloc_obj(struct kvm_mmu_memory_cache *mc,
+  gfp_t gfp_flags)
+{
+   if (mc->kmem_cache)
+   return kmem_cache_zalloc(mc->kmem_cache, gfp_flags);
+   else
+   return (void *)__get_free_page(gfp_flags);
+}
+
 static int mmu_topup_memory_cache(struct kvm_mmu_memory_cache *mc, int min)
 {
void *obj;
@@ -1067,10 +1076,7 @@ static int mmu_topup_memory_cache(struct 
kvm_mmu_memory_cache *mc, int min)
if (mc->nobjs >= min)
return 0;
while (mc->nobjs < ARRAY_SIZE(mc->objects)) {
-   if (mc->kmem_cache)
-   obj = kmem_cache_zalloc(mc->kmem_cache, 
GFP_KERNEL_ACCOUNT);
-   else
-   obj = (void *)__get_free_page(GFP_KERNEL_ACCOUNT);
+   obj = mmu_memory_cache_alloc_obj(mc, GFP_KERNEL_ACCOUNT);
if (!obj)
return mc->nobjs >= min ? 0 : -ENOMEM;
mc->objects[mc->nobjs++] = obj;
@@ -1118,8 +1124,11 @@ static void *mmu_memory_cache_alloc(struct 
kvm_mmu_memory_cache *mc)
 {
void *p;
 
-   BUG_ON(!mc->nobjs);
-   p = mc->objects[--mc->nobjs];
+   if (WARN_ON(!mc->nobjs))
+   p = mmu_memory_cache_alloc_obj(mc, GFP_ATOMIC | __GFP_ACCOUNT);
+   else
+   p = mc->objects[--mc->nobjs];
+   BUG_ON(!p);
return p;
 }
 
-- 
2.26.0

[PATCH v2 11/21] KVM: x86/mmu: Zero allocate shadow pages (outside of mmu_lock)

2020-06-22 Thread Sean Christopherson

Set __GFP_ZERO for the shadow page memory cache and drop the explicit
clear_page() from kvm_mmu_get_page().  This moves the cost of zeroing a
page to the allocation time of the physical page, i.e. when topping up
the memory caches, and thus avoids having to zero out an entire page
while holding mmu_lock.

Cc: Peter Feiner 
Cc: Peter Shier 
Cc: Junaid Shahid 
Cc: Jim Mattson 
Suggested-by: Ben Gardon 
Reviewed-by: Ben Gardon 
Signed-off-by: Sean Christopherson 
---
 arch/x86/kvm/mmu/mmu.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 6b0ec9060786..a8f8eebf67df 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -2545,7 +2545,6 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct 
kvm_vcpu *vcpu,
if (level > PG_LEVEL_4K && need_sync)
flush |= kvm_sync_pages(vcpu, gfn, _list);
}
-   clear_page(sp->spt);
trace_kvm_mmu_get_page(sp, true);
 
kvm_mmu_flush_or_zap(vcpu, _list, false, flush);
@@ -5687,6 +5686,8 @@ int kvm_mmu_create(struct kvm_vcpu *vcpu)
vcpu->arch.mmu_page_header_cache.kmem_cache = mmu_page_header_cache;
vcpu->arch.mmu_page_header_cache.gfp_zero = __GFP_ZERO;
 
+   vcpu->arch.mmu_shadow_page_cache.gfp_zero = __GFP_ZERO;
+
vcpu->arch.mmu = >arch.root_mmu;
vcpu->arch.walk_mmu = >arch.root_mmu;
 
-- 
2.26.0

[PATCH v2 10/21] KVM: x86/mmu: Make __GFP_ZERO a property of the memory cache

2020-06-22 Thread Sean Christopherson

Add a gfp_zero flag to 'struct kvm_mmu_memory_cache' and use it to
control __GFP_ZERO instead of hardcoding a call to kmem_cache_zalloc().
A future patch needs such a flag for the __get_free_page() path, as
gfn arrays do not need/want the allocator to zero the memory.  Convert
the kmem_cache paths to __GFP_ZERO now so as to avoid a weird and
inconsistent API in the future.

No functional change intended.

Reviewed-by: Ben Gardon 
Signed-off-by: Sean Christopherson 
---
 arch/x86/include/asm/kvm_host.h | 1 +
 arch/x86/kvm/mmu/mmu.c  | 7 ++-
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 376e1653ac41..67b84aa2984e 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -251,6 +251,7 @@ struct kvm_kernel_irq_routing_entry;
  */
 struct kvm_mmu_memory_cache {
int nobjs;
+   gfp_t gfp_zero;
struct kmem_cache *kmem_cache;
void *objects[KVM_NR_MEM_OBJS];
 };
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index d245acece3cd..6b0ec9060786 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -1063,8 +1063,10 @@ static void walk_shadow_page_lockless_end(struct 
kvm_vcpu *vcpu)
 static inline void *mmu_memory_cache_alloc_obj(struct kvm_mmu_memory_cache *mc,
   gfp_t gfp_flags)
 {
+   gfp_flags |= mc->gfp_zero;
+
if (mc->kmem_cache)
-   return kmem_cache_zalloc(mc->kmem_cache, gfp_flags);
+   return kmem_cache_alloc(mc->kmem_cache, gfp_flags);
else
return (void *)__get_free_page(gfp_flags);
 }
@@ -5680,7 +5682,10 @@ int kvm_mmu_create(struct kvm_vcpu *vcpu)
int ret;
 
vcpu->arch.mmu_pte_list_desc_cache.kmem_cache = pte_list_desc_cache;
+   vcpu->arch.mmu_pte_list_desc_cache.gfp_zero = __GFP_ZERO;
+
vcpu->arch.mmu_page_header_cache.kmem_cache = mmu_page_header_cache;
+   vcpu->arch.mmu_page_header_cache.gfp_zero = __GFP_ZERO;
 
vcpu->arch.mmu = >arch.root_mmu;
vcpu->arch.walk_mmu = >arch.root_mmu;
-- 
2.26.0

[RFC v5 08/10] drm/nouveau/kms/nv50-: Expose nv50_outp_atom in disp.h

2020-06-22 Thread Lyude Paul

In order to make sure that we flush disable updates at the right time
when disabling CRCs, we'll need to be able to look at the outp state to
see if we're changing it at the same time that we're disabling CRCs.

So, expose the struct in disp.h.

Signed-off-by: Lyude Paul 
---
 drivers/gpu/drm/nouveau/dispnv50/disp.c | 18 --
 drivers/gpu/drm/nouveau/dispnv50/disp.h | 14 ++
 2 files changed, 14 insertions(+), 18 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/dispnv50/disp.c 
b/drivers/gpu/drm/nouveau/dispnv50/disp.c
index 368069a5b181a..090882794f7d6 100644
--- a/drivers/gpu/drm/nouveau/dispnv50/disp.c
+++ b/drivers/gpu/drm/nouveau/dispnv50/disp.c
@@ -57,24 +57,6 @@
 
 #include 
 
-/**
- * Atomic state
- */
-
-struct nv50_outp_atom {
-   struct list_head head;
-
-   struct drm_encoder *encoder;
-   bool flush_disable;
-
-   union nv50_outp_atom_mask {
-   struct {
-   bool ctrl:1;
-   };
-   u8 mask;
-   } set, clr;
-};
-
 /**
  * EVO channel
  */
diff --git a/drivers/gpu/drm/nouveau/dispnv50/disp.h 
b/drivers/gpu/drm/nouveau/dispnv50/disp.h
index 696e70a6b98b6..c7b72fa850995 100644
--- a/drivers/gpu/drm/nouveau/dispnv50/disp.h
+++ b/drivers/gpu/drm/nouveau/dispnv50/disp.h
@@ -71,6 +71,20 @@ struct nv50_dmac {
struct mutex lock;
 };
 
+struct nv50_outp_atom {
+   struct list_head head;
+
+   struct drm_encoder *encoder;
+   bool flush_disable;
+
+   union nv50_outp_atom_mask {
+   struct {
+   bool ctrl:1;
+   };
+   u8 mask;
+   } set, clr;
+};
+
 int nv50_dmac_create(struct nvif_device *device, struct nvif_object *disp,
 const s32 *oclass, u8 head, void *data, u32 size,
 u64 syncbuf, struct nv50_dmac *dmac);
-- 
2.26.2

Re: [PATCH 13/16] mm: support THP migration to device private memory

2020-06-22 Thread Zi Yan

On 22 Jun 2020, at 15:36, Ralph Campbell wrote:

> On 6/21/20 4:20 PM, Zi Yan wrote:
>> On 19 Jun 2020, at 17:56, Ralph Campbell wrote:
>>
>>> Support transparent huge page migration to ZONE_DEVICE private memory.
>>> A new flag (MIGRATE_PFN_COMPOUND) is added to the input PFN array to
>>> indicate the huge page was fully mapped by the CPU.
>>> Export prep_compound_page() so that device drivers can create huge
>>> device private pages after calling memremap_pages().
>>>
>>> Signed-off-by: Ralph Campbell 
>>> ---
>>>   include/linux/migrate.h |   1 +
>>>   include/linux/mm.h  |   1 +
>>>   mm/huge_memory.c|  30 --
>>>   mm/internal.h   |   1 -
>>>   mm/memory.c |  10 +-
>>>   mm/memremap.c   |   9 +-
>>>   mm/migrate.c| 226 
>>>   mm/page_alloc.c |   1 +
>>>   8 files changed, 226 insertions(+), 53 deletions(-)
>>>
>>> diff --git a/include/linux/migrate.h b/include/linux/migrate.h
>>> index 3e546cbf03dd..f6a64965c8bd 100644
>>> --- a/include/linux/migrate.h
>>> +++ b/include/linux/migrate.h
>>> @@ -166,6 +166,7 @@ static inline int 
>>> migrate_misplaced_transhuge_page(struct mm_struct *mm,
>>>   #define MIGRATE_PFN_MIGRATE   (1UL << 1)
>>>   #define MIGRATE_PFN_LOCKED(1UL << 2)
>>>   #define MIGRATE_PFN_WRITE (1UL << 3)
>>> +#define MIGRATE_PFN_COMPOUND   (1UL << 4)
>>>   #define MIGRATE_PFN_SHIFT 6
>>>
>>>   static inline struct page *migrate_pfn_to_page(unsigned long mpfn)
>>> diff --git a/include/linux/mm.h b/include/linux/mm.h
>>> index dc7b87310c10..020b9dd3cddb 100644
>>> --- a/include/linux/mm.h
>>> +++ b/include/linux/mm.h
>>> @@ -932,6 +932,7 @@ static inline unsigned int page_shift(struct page *page)
>>>   }
>>>
>>>   void free_compound_page(struct page *page);
>>> +void prep_compound_page(struct page *page, unsigned int order);
>>>
>>>   #ifdef CONFIG_MMU
>>>   /*
>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>>> index 78c84bee7e29..25d95f7b1e98 100644
>>> --- a/mm/huge_memory.c
>>> +++ b/mm/huge_memory.c
>>> @@ -1663,23 +1663,35 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct 
>>> vm_area_struct *vma,
>>> } else {
>>> struct page *page = NULL;
>>> int flush_needed = 1;
>>> +   bool is_anon = false;
>>>
>>> if (pmd_present(orig_pmd)) {
>>> page = pmd_page(orig_pmd);
>>> +   is_anon = PageAnon(page);
>>> page_remove_rmap(page, true);
>>> VM_BUG_ON_PAGE(page_mapcount(page) < 0, page);
>>> VM_BUG_ON_PAGE(!PageHead(page), page);
>>> } else if (thp_migration_supported()) {
>>> swp_entry_t entry;
>>>
>>> -   VM_BUG_ON(!is_pmd_migration_entry(orig_pmd));
>>> entry = pmd_to_swp_entry(orig_pmd);
>>> -   page = pfn_to_page(swp_offset(entry));
>>> +   if (is_device_private_entry(entry)) {
>>> +   page = device_private_entry_to_page(entry);
>>> +   is_anon = PageAnon(page);
>>> +   page_remove_rmap(page, true);
>>> +   VM_BUG_ON_PAGE(page_mapcount(page) < 0, page);
>>> +   VM_BUG_ON_PAGE(!PageHead(page), page);
>>> +   put_page(page);
>>
>> Why do you hide this code behind thp_migration_supported()? It seems that 
>> you just need
>> pmd swap entry not pmd migration entry. Also the condition is not consistent 
>> with the code
>> in __handle_mm_fault(), in which you handle is_device_private_entry() 
>> directly without
>> checking thp_migration_support().
>
> Good point, I think "else if (thp_migration_supported())" should be
> "else if (is_pmd_migration_entry(orig_pmd))" since if the PMD *is*
> a device private or migration entry, then it should be handled and the
> VM_BUG_ON() should be that thp_migration_supported() is true
> (or maybe remove the VM_BUG_ON?).

I disagree. A device private entry is independent of a PMD migration entry, 
since a device private
entry is just a swap entry, which is available when 
CONFIG_TRANSPARENT_HUGEPAGE. So for architectures
support THP but not THP migration (like ARM64), your code should still work.

I would suggest you to check all the use of is_swap_pmd() and make sure the code
can handle is_device_private_entry().

For new device private code, you might need to guard it either statically or 
dynamically in case
CONFIG_DEVICE_PRIVATE is disabled. Potentially, you would like to make sure a 
system without
CONFIG_DEVICE_PRIVATE will not see is_device_private_entry() == true and give 
errors when it does.


>
>> Do we need to support split_huge_pmd() if a page is migrated to device? Any 
>> new code
>> needed in split_huge_pmd()?
>
> I was thinking that any CPU usage of the device private page would cause it 
> to be
> migrated back to system memory as a whole

[PATCH v2 01/21] KVM: x86/mmu: Track the associated kmem_cache in the MMU caches

2020-06-22 Thread Sean Christopherson

Track the kmem_cache used for non-page KVM MMU memory caches instead of
passing in the associated kmem_cache when filling the cache.  This will
allow consolidating code and other cleanups.

No functional change intended.

Reviewed-by: Ben Gardon 
Signed-off-by: Sean Christopherson 
---
 arch/x86/include/asm/kvm_host.h |  1 +
 arch/x86/kvm/mmu/mmu.c  | 24 +++-
 2 files changed, 12 insertions(+), 13 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index f8998e97457f..7b6ac8fad9c2 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -251,6 +251,7 @@ struct kvm_kernel_irq_routing_entry;
  */
 struct kvm_mmu_memory_cache {
int nobjs;
+   struct kmem_cache *kmem_cache;
void *objects[KVM_NR_MEM_OBJS];
 };
 
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index fdd05c233308..0830c195c9ed 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -1060,15 +1060,14 @@ static void walk_shadow_page_lockless_end(struct 
kvm_vcpu *vcpu)
local_irq_enable();
 }
 
-static int mmu_topup_memory_cache(struct kvm_mmu_memory_cache *cache,
- struct kmem_cache *base_cache, int min)
+static int mmu_topup_memory_cache(struct kvm_mmu_memory_cache *cache, int min)
 {
void *obj;
 
if (cache->nobjs >= min)
return 0;
while (cache->nobjs < ARRAY_SIZE(cache->objects)) {
-   obj = kmem_cache_zalloc(base_cache, GFP_KERNEL_ACCOUNT);
+   obj = kmem_cache_zalloc(cache->kmem_cache, GFP_KERNEL_ACCOUNT);
if (!obj)
return cache->nobjs >= min ? 0 : -ENOMEM;
cache->objects[cache->nobjs++] = obj;
@@ -1081,11 +1080,10 @@ static int mmu_memory_cache_free_objects(struct 
kvm_mmu_memory_cache *cache)
return cache->nobjs;
 }
 
-static void mmu_free_memory_cache(struct kvm_mmu_memory_cache *mc,
- struct kmem_cache *cache)
+static void mmu_free_memory_cache(struct kvm_mmu_memory_cache *mc)
 {
while (mc->nobjs)
-   kmem_cache_free(cache, mc->objects[--mc->nobjs]);
+   kmem_cache_free(mc->kmem_cache, mc->objects[--mc->nobjs]);
 }
 
 static int mmu_topup_memory_cache_page(struct kvm_mmu_memory_cache *cache,
@@ -1115,25 +1113,22 @@ static int mmu_topup_memory_caches(struct kvm_vcpu 
*vcpu)
int r;
 
r = mmu_topup_memory_cache(>arch.mmu_pte_list_desc_cache,
-  pte_list_desc_cache, 8 + PTE_PREFETCH_NUM);
+  8 + PTE_PREFETCH_NUM);
if (r)
goto out;
r = mmu_topup_memory_cache_page(>arch.mmu_page_cache, 8);
if (r)
goto out;
-   r = mmu_topup_memory_cache(>arch.mmu_page_header_cache,
-  mmu_page_header_cache, 4);
+   r = mmu_topup_memory_cache(>arch.mmu_page_header_cache, 4);
 out:
return r;
 }
 
 static void mmu_free_memory_caches(struct kvm_vcpu *vcpu)
 {
-   mmu_free_memory_cache(>arch.mmu_pte_list_desc_cache,
-   pte_list_desc_cache);
+   mmu_free_memory_cache(>arch.mmu_pte_list_desc_cache);
mmu_free_memory_cache_page(>arch.mmu_page_cache);
-   mmu_free_memory_cache(>arch.mmu_page_header_cache,
-   mmu_page_header_cache);
+   mmu_free_memory_cache(>arch.mmu_page_header_cache);
 }
 
 static void *mmu_memory_cache_alloc(struct kvm_mmu_memory_cache *mc)
@@ -5684,6 +5679,9 @@ int kvm_mmu_create(struct kvm_vcpu *vcpu)
uint i;
int ret;
 
+   vcpu->arch.mmu_pte_list_desc_cache.kmem_cache = pte_list_desc_cache;
+   vcpu->arch.mmu_page_header_cache.kmem_cache = mmu_page_header_cache;
+
vcpu->arch.mmu = >arch.root_mmu;
vcpu->arch.walk_mmu = >arch.root_mmu;
 
-- 
2.26.0

[RFC v5 09/10] drm/nouveau/kms/nv50-: Move hard-coded object handles into header

2020-06-22 Thread Lyude Paul

While most of the functionality on Nvidia GPUs doesn't require using an
explicit handle instead of the main VRAM handle + offset, there are a
couple of places that do require explicit handles, such as CRC
functionality. Since this means we're about to add another
nouveau-chosen handle, let's just go ahead and move any hard-coded
handles into a single header. This is just to keep things slightly
organized, and to make it a little bit easier if we need to add more
handles in the future.

This patch should contain no functional changes.

Changes since v3:
* Correct SPDX license identifier (checkpatch)

Signed-off-by: Lyude Paul 
---
 drivers/gpu/drm/nouveau/dispnv50/disp.c|  7 +--
 drivers/gpu/drm/nouveau/dispnv50/handles.h | 15 +++
 drivers/gpu/drm/nouveau/dispnv50/wndw.c|  3 ++-
 3 files changed, 22 insertions(+), 3 deletions(-)
 create mode 100644 drivers/gpu/drm/nouveau/dispnv50/handles.h

diff --git a/drivers/gpu/drm/nouveau/dispnv50/disp.c 
b/drivers/gpu/drm/nouveau/dispnv50/disp.c
index 090882794f7d6..bf7ba1e1c0f74 100644
--- a/drivers/gpu/drm/nouveau/dispnv50/disp.c
+++ b/drivers/gpu/drm/nouveau/dispnv50/disp.c
@@ -26,6 +26,7 @@
 #include "core.h"
 #include "head.h"
 #include "wndw.h"
+#include "handles.h"
 
 #include 
 #include 
@@ -154,7 +155,8 @@ nv50_dmac_create(struct nvif_device *device, struct 
nvif_object *disp,
if (!syncbuf)
return 0;
 
-   ret = nvif_object_init(>base.user, 0xf000, NV_DMA_IN_MEMORY,
+   ret = nvif_object_init(>base.user, NV50_DISP_HANDLE_SYNCBUF,
+  NV_DMA_IN_MEMORY,
   &(struct nv_dma_v0) {
.target = NV_DMA_V0_TARGET_VRAM,
.access = NV_DMA_V0_ACCESS_RDWR,
@@ -165,7 +167,8 @@ nv50_dmac_create(struct nvif_device *device, struct 
nvif_object *disp,
if (ret)
return ret;
 
-   ret = nvif_object_init(>base.user, 0xf001, NV_DMA_IN_MEMORY,
+   ret = nvif_object_init(>base.user, NV50_DISP_HANDLE_VRAM,
+  NV_DMA_IN_MEMORY,
   &(struct nv_dma_v0) {
.target = NV_DMA_V0_TARGET_VRAM,
.access = NV_DMA_V0_ACCESS_RDWR,
diff --git a/drivers/gpu/drm/nouveau/dispnv50/handles.h 
b/drivers/gpu/drm/nouveau/dispnv50/handles.h
new file mode 100644
index 0..d1beeb9a444db
--- /dev/null
+++ b/drivers/gpu/drm/nouveau/dispnv50/handles.h
@@ -0,0 +1,15 @@
+// SPDX-License-Identifier: MIT
+#ifndef __NV50_KMS_HANDLES_H__
+#define __NV50_KMS_HANDLES_H__
+
+/*
+ * Various hard-coded object handles that nouveau uses. These are made-up by
+ * nouveau developers, not Nvidia. The only significance of the handles chosen
+ * is that they must all be unique.
+ */
+#define NV50_DISP_HANDLE_SYNCBUF
0xf000
+#define NV50_DISP_HANDLE_VRAM   
0xf001
+
+#define NV50_DISP_HANDLE_WNDW_CTX(kind)(0xfb00 | 
kind)
+
+#endif /* !__NV50_KMS_HANDLES_H__ */
diff --git a/drivers/gpu/drm/nouveau/dispnv50/wndw.c 
b/drivers/gpu/drm/nouveau/dispnv50/wndw.c
index cfee61f14aa49..9d963ecdd34e8 100644
--- a/drivers/gpu/drm/nouveau/dispnv50/wndw.c
+++ b/drivers/gpu/drm/nouveau/dispnv50/wndw.c
@@ -21,6 +21,7 @@
  */
 #include "wndw.h"
 #include "wimm.h"
+#include "handles.h"
 
 #include 
 #include 
@@ -59,7 +60,7 @@ nv50_wndw_ctxdma_new(struct nv50_wndw *wndw, struct 
drm_framebuffer *fb)
int ret;
 
nouveau_framebuffer_get_layout(fb, , );
-   handle = 0xfb00 | kind;
+   handle = NV50_DISP_HANDLE_WNDW_CTX(kind);
 
list_for_each_entry(ctxdma, >ctxdma.list, head) {
if (ctxdma->object.handle == handle)
-- 
2.26.2

[PATCH v2 04/21] KVM: x86/mmu: Remove superfluous gotos from mmu_topup_memory_caches()

2020-06-22 Thread Sean Christopherson

Return errors directly from mmu_topup_memory_caches() instead of
branching to a label that does the same.

No functional change intended.

Reviewed-by: Ben Gardon 
Signed-off-by: Sean Christopherson 
---
 arch/x86/kvm/mmu/mmu.c | 8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 36c90f004ef4..ba70de24a5b0 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -1100,13 +1100,11 @@ static int mmu_topup_memory_caches(struct kvm_vcpu 
*vcpu)
r = mmu_topup_memory_cache(>arch.mmu_pte_list_desc_cache,
   8 + PTE_PREFETCH_NUM);
if (r)
-   goto out;
+   return r;
r = mmu_topup_memory_cache(>arch.mmu_page_cache, 8);
if (r)
-   goto out;
-   r = mmu_topup_memory_cache(>arch.mmu_page_header_cache, 4);
-out:
-   return r;
+   return r;
+   return mmu_topup_memory_cache(>arch.mmu_page_header_cache, 4);
 }
 
 static void mmu_free_memory_caches(struct kvm_vcpu *vcpu)
-- 
2.26.0

[PATCH v2 12/21] KVM: x86/mmu: Skip filling the gfn cache for guaranteed direct MMU topups

2020-06-22 Thread Sean Christopherson

Don't bother filling the gfn array cache when the caller is a fully
direct MMU, i.e. won't need a gfn array for shadow pages.

Reviewed-by: Ben Gardon 
Signed-off-by: Sean Christopherson 
---
 arch/x86/kvm/mmu/mmu.c | 18 ++
 arch/x86/kvm/mmu/paging_tmpl.h |  4 ++--
 2 files changed, 12 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index a8f8eebf67df..8d66cf558f1b 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -1101,7 +1101,7 @@ static void mmu_free_memory_cache(struct 
kvm_mmu_memory_cache *mc)
}
 }
 
-static int mmu_topup_memory_caches(struct kvm_vcpu *vcpu)
+static int mmu_topup_memory_caches(struct kvm_vcpu *vcpu, bool maybe_indirect)
 {
int r;
 
@@ -1114,10 +1114,12 @@ static int mmu_topup_memory_caches(struct kvm_vcpu 
*vcpu)
   PT64_ROOT_MAX_LEVEL);
if (r)
return r;
-   r = mmu_topup_memory_cache(>arch.mmu_gfn_array_cache,
-  PT64_ROOT_MAX_LEVEL);
-   if (r)
-   return r;
+   if (maybe_indirect) {
+   r = mmu_topup_memory_cache(>arch.mmu_gfn_array_cache,
+  PT64_ROOT_MAX_LEVEL);
+   if (r)
+   return r;
+   }
return mmu_topup_memory_cache(>arch.mmu_page_header_cache,
  PT64_ROOT_MAX_LEVEL);
 }
@@ -4107,7 +4109,7 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, gpa_t 
gpa, u32 error_code,
if (fast_page_fault(vcpu, gpa, error_code))
return RET_PF_RETRY;
 
-   r = mmu_topup_memory_caches(vcpu);
+   r = mmu_topup_memory_caches(vcpu, false);
if (r)
return r;
 
@@ -5147,7 +5149,7 @@ int kvm_mmu_load(struct kvm_vcpu *vcpu)
 {
int r;
 
-   r = mmu_topup_memory_caches(vcpu);
+   r = mmu_topup_memory_caches(vcpu, !vcpu->arch.mmu->direct_map);
if (r)
goto out;
r = mmu_alloc_roots(vcpu);
@@ -5341,7 +5343,7 @@ static void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, 
gpa_t gpa,
 * or not since pte prefetch is skiped if it does not have
 * enough objects in the cache.
 */
-   mmu_topup_memory_caches(vcpu);
+   mmu_topup_memory_caches(vcpu, true);
 
spin_lock(>kvm->mmu_lock);
 
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index 3de32122f601..ac39710d0594 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -818,7 +818,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gpa_t 
addr, u32 error_code,
return RET_PF_EMULATE;
}
 
-   r = mmu_topup_memory_caches(vcpu);
+   r = mmu_topup_memory_caches(vcpu, true);
if (r)
return r;
 
@@ -905,7 +905,7 @@ static void FNAME(invlpg)(struct kvm_vcpu *vcpu, gva_t gva, 
hpa_t root_hpa)
 * No need to check return value here, rmap_can_add() can
 * help us to skip pte prefetch later.
 */
-   mmu_topup_memory_caches(vcpu);
+   mmu_topup_memory_caches(vcpu, true);
 
if (!VALID_PAGE(root_hpa)) {
WARN_ON(1);
-- 
2.26.0

[PATCH v2 13/21] KVM: x86/mmu: Prepend "kvm_" to memory cache helpers that will be global

2020-06-22 Thread Sean Christopherson

Rename the memory helpers that will soon be moved to common code and be
made globaly available via linux/kvm_host.h.  "mmu" alone is not a
sufficient namespace for globally available KVM symbols.

Opportunistically add "nr_" in mmu_memory_cache_free_objects() to make
it clear the function returns the number of free objects, as opposed to
freeing existing objects.

Suggested-by: Christoffer Dall 
Reviewed-by: Ben Gardon 
Signed-off-by: Sean Christopherson 
---
 arch/x86/kvm/mmu/mmu.c | 42 +-
 1 file changed, 21 insertions(+), 21 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 8d66cf558f1b..b85d3e8e8403 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -1071,7 +1071,7 @@ static inline void *mmu_memory_cache_alloc_obj(struct 
kvm_mmu_memory_cache *mc,
return (void *)__get_free_page(gfp_flags);
 }
 
-static int mmu_topup_memory_cache(struct kvm_mmu_memory_cache *mc, int min)
+static int kvm_mmu_topup_memory_cache(struct kvm_mmu_memory_cache *mc, int min)
 {
void *obj;
 
@@ -1086,12 +1086,12 @@ static int mmu_topup_memory_cache(struct 
kvm_mmu_memory_cache *mc, int min)
return 0;
 }
 
-static int mmu_memory_cache_free_objects(struct kvm_mmu_memory_cache *mc)
+static int kvm_mmu_memory_cache_nr_free_objects(struct kvm_mmu_memory_cache 
*mc)
 {
return mc->nobjs;
 }
 
-static void mmu_free_memory_cache(struct kvm_mmu_memory_cache *mc)
+static void kvm_mmu_free_memory_cache(struct kvm_mmu_memory_cache *mc)
 {
while (mc->nobjs) {
if (mc->kmem_cache)
@@ -1106,33 +1106,33 @@ static int mmu_topup_memory_caches(struct kvm_vcpu 
*vcpu, bool maybe_indirect)
int r;
 
/* 1 rmap, 1 parent PTE per level, and the prefetched rmaps. */
-   r = mmu_topup_memory_cache(>arch.mmu_pte_list_desc_cache,
-  1 + PT64_ROOT_MAX_LEVEL + PTE_PREFETCH_NUM);
+   r = kvm_mmu_topup_memory_cache(>arch.mmu_pte_list_desc_cache,
+  1 + PT64_ROOT_MAX_LEVEL + 
PTE_PREFETCH_NUM);
if (r)
return r;
-   r = mmu_topup_memory_cache(>arch.mmu_shadow_page_cache,
-  PT64_ROOT_MAX_LEVEL);
+   r = kvm_mmu_topup_memory_cache(>arch.mmu_shadow_page_cache,
+  PT64_ROOT_MAX_LEVEL);
if (r)
return r;
if (maybe_indirect) {
-   r = mmu_topup_memory_cache(>arch.mmu_gfn_array_cache,
-  PT64_ROOT_MAX_LEVEL);
+   r = kvm_mmu_topup_memory_cache(>arch.mmu_gfn_array_cache,
+  PT64_ROOT_MAX_LEVEL);
if (r)
return r;
}
-   return mmu_topup_memory_cache(>arch.mmu_page_header_cache,
- PT64_ROOT_MAX_LEVEL);
+   return kvm_mmu_topup_memory_cache(>arch.mmu_page_header_cache,
+ PT64_ROOT_MAX_LEVEL);
 }
 
 static void mmu_free_memory_caches(struct kvm_vcpu *vcpu)
 {
-   mmu_free_memory_cache(>arch.mmu_pte_list_desc_cache);
-   mmu_free_memory_cache(>arch.mmu_shadow_page_cache);
-   mmu_free_memory_cache(>arch.mmu_gfn_array_cache);
-   mmu_free_memory_cache(>arch.mmu_page_header_cache);
+   kvm_mmu_free_memory_cache(>arch.mmu_pte_list_desc_cache);
+   kvm_mmu_free_memory_cache(>arch.mmu_shadow_page_cache);
+   kvm_mmu_free_memory_cache(>arch.mmu_gfn_array_cache);
+   kvm_mmu_free_memory_cache(>arch.mmu_page_header_cache);
 }
 
-static void *mmu_memory_cache_alloc(struct kvm_mmu_memory_cache *mc)
+static void *kvm_mmu_memory_cache_alloc(struct kvm_mmu_memory_cache *mc)
 {
void *p;
 
@@ -1146,7 +1146,7 @@ static void *mmu_memory_cache_alloc(struct 
kvm_mmu_memory_cache *mc)
 
 static struct pte_list_desc *mmu_alloc_pte_list_desc(struct kvm_vcpu *vcpu)
 {
-   return mmu_memory_cache_alloc(>arch.mmu_pte_list_desc_cache);
+   return kvm_mmu_memory_cache_alloc(>arch.mmu_pte_list_desc_cache);
 }
 
 static void mmu_free_pte_list_desc(struct pte_list_desc *pte_list_desc)
@@ -1417,7 +1417,7 @@ static bool rmap_can_add(struct kvm_vcpu *vcpu)
struct kvm_mmu_memory_cache *mc;
 
mc = >arch.mmu_pte_list_desc_cache;
-   return mmu_memory_cache_free_objects(mc);
+   return kvm_mmu_memory_cache_nr_free_objects(mc);
 }
 
 static int rmap_add(struct kvm_vcpu *vcpu, u64 *spte, gfn_t gfn)
@@ -2104,10 +2104,10 @@ static struct kvm_mmu_page *kvm_mmu_alloc_page(struct 
kvm_vcpu *vcpu, int direct
 {
struct kvm_mmu_page *sp;
 
-   sp = mmu_memory_cache_alloc(>arch.mmu_page_header_cache);
-   sp->spt = mmu_memory_cache_alloc(>arch.mmu_shadow_page_cache);
+   sp = kvm_mmu_memory_cache_alloc(>arch.mmu_page_header_cache);
+   sp->spt = kvm_mmu_memory_cache_alloc(>arch.mmu_shadow_page_cache);
if (!direct)
-

[PATCH v2 19/21] KVM: MIPS: Drop @max param from mmu_topup_memory_cache()

2020-06-22 Thread Sean Christopherson

Replace the @max param in mmu_topup_memory_cache() and instead use
ARRAY_SIZE() to terminate the loop to fill the cache.  This removes a
BUG_ON() and sets the stage for moving MIPS to the common memory cache
implementation.

No functional change intended.

Signed-off-by: Sean Christopherson 
---
 arch/mips/kvm/mmu.c | 12 
 1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/arch/mips/kvm/mmu.c b/arch/mips/kvm/mmu.c
index 7dad7a293eae..94562c54b930 100644
--- a/arch/mips/kvm/mmu.c
+++ b/arch/mips/kvm/mmu.c
@@ -25,15 +25,13 @@
 #define KVM_MMU_CACHE_MIN_PAGES 2
 #endif
 
-static int mmu_topup_memory_cache(struct kvm_mmu_memory_cache *cache,
- int min, int max)
+static int mmu_topup_memory_cache(struct kvm_mmu_memory_cache *cache, int min)
 {
void *page;
 
-   BUG_ON(max > KVM_NR_MEM_OBJS);
if (cache->nobjs >= min)
return 0;
-   while (cache->nobjs < max) {
+   while (cache->nobjs < ARRAY_SIZE(cache->objects)) {
page = (void *)__get_free_page(GFP_KERNEL);
if (!page)
return -ENOMEM;
@@ -711,8 +709,7 @@ static int kvm_mips_map_page(struct kvm_vcpu *vcpu, 
unsigned long gpa,
goto out;
 
/* We need a minimum of cached pages ready for page table creation */
-   err = mmu_topup_memory_cache(memcache, KVM_MMU_CACHE_MIN_PAGES,
-KVM_NR_MEM_OBJS);
+   err = mmu_topup_memory_cache(memcache, KVM_MMU_CACHE_MIN_PAGES);
if (err)
goto out;
 
@@ -796,8 +793,7 @@ static pte_t *kvm_trap_emul_pte_for_gva(struct kvm_vcpu 
*vcpu,
int ret;
 
/* We need a minimum of cached pages ready for page table creation */
-   ret = mmu_topup_memory_cache(memcache, KVM_MMU_CACHE_MIN_PAGES,
-KVM_NR_MEM_OBJS);
+   ret = mmu_topup_memory_cache(memcache, KVM_MMU_CACHE_MIN_PAGES);
if (ret)
return NULL;
 
-- 
2.26.0

Re: [PATCH] binder: fix null deref of proc->context

2020-06-22 Thread Christian Brauner

On Mon, Jun 22, 2020 at 01:07:15PM -0700, Todd Kjos wrote:
> The binder driver makes the assumption proc->context pointer is invariant 
> after
> initialization (as documented in the kerneldoc header for struct proc).
> However, in commit f0fe2c0f050d ("binder: prevent UAF for binderfs devices 
> II")
> proc->context is set to NULL during binder_deferred_release().
> 
> Another proc was in the middle of setting up a transaction to the dying
> process and crashed on a NULL pointer deref on "context" which is a local
> set to >context:
> 
> new_ref->data.desc = (node == context->binder_context_mgr_node) ? 0 : 1;
> 
> Here's the stack:
> 
> [ 5237.855435] Call trace:
> [ 5237.855441] binder_get_ref_for_node_olocked+0x100/0x2ec
> [ 5237.855446] binder_inc_ref_for_node+0x140/0x280
> [ 5237.855451] binder_translate_binder+0x1d0/0x388
> [ 5237.855456] binder_transaction+0x2228/0x3730
> [ 5237.855461] binder_thread_write+0x640/0x25bc
> [ 5237.855466] binder_ioctl_write_read+0xb0/0x464
> [ 5237.855471] binder_ioctl+0x30c/0x96c
> [ 5237.855477] do_vfs_ioctl+0x3e0/0x700
> [ 5237.855482] __arm64_sys_ioctl+0x78/0xa4
> [ 5237.855488] el0_svc_common+0xb4/0x194
> [ 5237.855493] el0_svc_handler+0x74/0x98
> [ 5237.855497] el0_svc+0x8/0xc
> 
> The fix is to move the kfree of the binder_device to binder_free_proc()
> so the binder_device is freed when we know there are no references
> remaining on the binder_proc.
> 
> Fixes: f0fe2c0f050d ("binder: prevent UAF for binderfs devices II")
> Signed-off-by: Todd Kjos 

Thanks, looks good to me!
Acked-by: Christian Brauner 

Christian

> ---
>  drivers/android/binder.c | 14 +++---
>  1 file changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/android/binder.c b/drivers/android/binder.c
> index e47c8a4c83db..f50c5f182bb5 100644
> --- a/drivers/android/binder.c
> +++ b/drivers/android/binder.c
> @@ -4686,8 +4686,15 @@ static struct binder_thread *binder_get_thread(struct 
> binder_proc *proc)
>  
>  static void binder_free_proc(struct binder_proc *proc)
>  {
> + struct binder_device *device;
> +
>   BUG_ON(!list_empty(>todo));
>   BUG_ON(!list_empty(>delivered_death));
> + device = container_of(proc->context, struct binder_device, context);
> + if (refcount_dec_and_test(>ref)) {
> + kfree(proc->context->name);
> + kfree(device);
> + }
>   binder_alloc_deferred_release(>alloc);
>   put_task_struct(proc->tsk);
>   binder_stats_deleted(BINDER_STAT_PROC);
> @@ -5406,7 +5413,6 @@ static int binder_node_release(struct binder_node 
> *node, int refs)
>  static void binder_deferred_release(struct binder_proc *proc)
>  {
>   struct binder_context *context = proc->context;
> - struct binder_device *device;
>   struct rb_node *n;
>   int threads, nodes, incoming_refs, outgoing_refs, active_transactions;
>  
> @@ -5423,12 +5429,6 @@ static void binder_deferred_release(struct binder_proc 
> *proc)
>   context->binder_context_mgr_node = NULL;
>   }
>   mutex_unlock(>context_mgr_node_lock);
> - device = container_of(proc->context, struct binder_device, context);
> - if (refcount_dec_and_test(>ref)) {
> - kfree(context->name);
> - kfree(device);
> - }
> - proc->context = NULL;
>   binder_inner_proc_lock(proc);
>   /*
>* Make sure proc stays alive after we
> -- 
> 2.27.0.111.gc72c7da667-goog
>

[PATCH v2 21/21] KVM: MIPS: Use common KVM implementation of MMU memory caches

2020-06-22 Thread Sean Christopherson

Move to the common MMU memory cache implementation now that the common
code and MIPS's existing code are semantically compatible.

No functional change intended.

Suggested-by: Christoffer Dall 
Signed-off-by: Sean Christopherson 
---
 arch/mips/include/asm/Kbuild  |  1 -
 arch/mips/include/asm/kvm_host.h  | 11 -
 arch/mips/include/asm/kvm_types.h |  7 ++
 arch/mips/kvm/mmu.c   | 40 ---
 4 files changed, 12 insertions(+), 47 deletions(-)
 create mode 100644 arch/mips/include/asm/kvm_types.h

diff --git a/arch/mips/include/asm/Kbuild b/arch/mips/include/asm/Kbuild
index 397e6d24d2ab..8643d313890e 100644
--- a/arch/mips/include/asm/Kbuild
+++ b/arch/mips/include/asm/Kbuild
@@ -5,7 +5,6 @@ generated-y += syscall_table_64_n32.h
 generated-y += syscall_table_64_n64.h
 generated-y += syscall_table_64_o32.h
 generic-y += export.h
-generic-y += kvm_types.h
 generic-y += local64.h
 generic-y += mcs_spinlock.h
 generic-y += parport.h
diff --git a/arch/mips/include/asm/kvm_host.h b/arch/mips/include/asm/kvm_host.h
index 363e7a89d173..f49617175f60 100644
--- a/arch/mips/include/asm/kvm_host.h
+++ b/arch/mips/include/asm/kvm_host.h
@@ -335,17 +335,6 @@ struct kvm_mips_tlb {
long tlb_lo[2];
 };
 
-#define KVM_NR_MEM_OBJS 4
-
-/*
- * We don't want allocation failures within the mmu code, so we preallocate
- * enough memory for a single page fault in a cache.
- */
-struct kvm_mmu_memory_cache {
-   int nobjs;
-   void *objects[KVM_NR_MEM_OBJS];
-};
-
 #define KVM_MIPS_AUX_FPU   0x1
 #define KVM_MIPS_AUX_MSA   0x2
 
diff --git a/arch/mips/include/asm/kvm_types.h 
b/arch/mips/include/asm/kvm_types.h
new file mode 100644
index ..213754d9ef6b
--- /dev/null
+++ b/arch/mips/include/asm/kvm_types.h
@@ -0,0 +1,7 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_MIPS_KVM_TYPES_H
+#define _ASM_MIPS_KVM_TYPES_H
+
+#define KVM_ARCH_NR_OBJS_PER_MEMORY_CACHE 4
+
+#endif /* _ASM_MIPS_KVM_TYPES_H */
diff --git a/arch/mips/kvm/mmu.c b/arch/mips/kvm/mmu.c
index 41a4a063a730..d6acd88c0c46 100644
--- a/arch/mips/kvm/mmu.c
+++ b/arch/mips/kvm/mmu.c
@@ -25,39 +25,9 @@
 #define KVM_MMU_CACHE_MIN_PAGES 2
 #endif
 
-static int mmu_topup_memory_cache(struct kvm_mmu_memory_cache *cache, int min)
-{
-   void *page;
-
-   if (cache->nobjs >= min)
-   return 0;
-   while (cache->nobjs < ARRAY_SIZE(cache->objects)) {
-   page = (void *)__get_free_page(GFP_KERNEL_ACCOUNT);
-   if (!page)
-   return -ENOMEM;
-   cache->objects[cache->nobjs++] = page;
-   }
-   return 0;
-}
-
-static void mmu_free_memory_cache(struct kvm_mmu_memory_cache *mc)
-{
-   while (mc->nobjs)
-   free_page((unsigned long)mc->objects[--mc->nobjs]);
-}
-
-static void *mmu_memory_cache_alloc(struct kvm_mmu_memory_cache *mc)
-{
-   void *p;
-
-   BUG_ON(!mc || !mc->nobjs);
-   p = mc->objects[--mc->nobjs];
-   return p;
-}
-
 void kvm_mmu_free_memory_caches(struct kvm_vcpu *vcpu)
 {
-   mmu_free_memory_cache(>arch.mmu_page_cache);
+   kvm_mmu_free_memory_cache(>arch.mmu_page_cache);
 }
 
 /**
@@ -151,7 +121,7 @@ static pte_t *kvm_mips_walk_pgd(pgd_t *pgd, struct 
kvm_mmu_memory_cache *cache,
 
if (!cache)
return NULL;
-   new_pmd = mmu_memory_cache_alloc(cache);
+   new_pmd = kvm_mmu_memory_cache_alloc(cache);
pmd_init((unsigned long)new_pmd,
 (unsigned long)invalid_pte_table);
pud_populate(NULL, pud, new_pmd);
@@ -162,7 +132,7 @@ static pte_t *kvm_mips_walk_pgd(pgd_t *pgd, struct 
kvm_mmu_memory_cache *cache,
 
if (!cache)
return NULL;
-   new_pte = mmu_memory_cache_alloc(cache);
+   new_pte = kvm_mmu_memory_cache_alloc(cache);
clear_page(new_pte);
pmd_populate_kernel(NULL, pmd, new_pte);
}
@@ -709,7 +679,7 @@ static int kvm_mips_map_page(struct kvm_vcpu *vcpu, 
unsigned long gpa,
goto out;
 
/* We need a minimum of cached pages ready for page table creation */
-   err = mmu_topup_memory_cache(memcache, KVM_MMU_CACHE_MIN_PAGES);
+   err = kvm_mmu_topup_memory_cache(memcache, KVM_MMU_CACHE_MIN_PAGES);
if (err)
goto out;
 
@@ -793,7 +763,7 @@ static pte_t *kvm_trap_emul_pte_for_gva(struct kvm_vcpu 
*vcpu,
int ret;
 
/* We need a minimum of cached pages ready for page table creation */
-   ret = mmu_topup_memory_cache(memcache, KVM_MMU_CACHE_MIN_PAGES);
+   ret = kvm_mmu_topup_memory_cache(memcache, KVM_MMU_CACHE_MIN_PAGES);
if (ret)
return NULL;
 
-- 
2.26.0

[PATCH v2 15/21] KVM: Move x86's MMU memory cache helpers to common KVM code

2020-06-22 Thread Sean Christopherson

Move x86's memory cache helpers to common KVM code so that they can be
reused by arm64 and MIPS in future patches.

Suggested-by: Christoffer Dall 
Reviewed-by: Ben Gardon 
Signed-off-by: Sean Christopherson 
---
 arch/x86/kvm/mmu/mmu.c   | 53 --
 include/linux/kvm_host.h |  7 +
 virt/kvm/kvm_main.c  | 55 
 3 files changed, 62 insertions(+), 53 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index b85d3e8e8403..a627437f73fd 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -1060,47 +1060,6 @@ static void walk_shadow_page_lockless_end(struct 
kvm_vcpu *vcpu)
local_irq_enable();
 }
 
-static inline void *mmu_memory_cache_alloc_obj(struct kvm_mmu_memory_cache *mc,
-  gfp_t gfp_flags)
-{
-   gfp_flags |= mc->gfp_zero;
-
-   if (mc->kmem_cache)
-   return kmem_cache_alloc(mc->kmem_cache, gfp_flags);
-   else
-   return (void *)__get_free_page(gfp_flags);
-}
-
-static int kvm_mmu_topup_memory_cache(struct kvm_mmu_memory_cache *mc, int min)
-{
-   void *obj;
-
-   if (mc->nobjs >= min)
-   return 0;
-   while (mc->nobjs < ARRAY_SIZE(mc->objects)) {
-   obj = mmu_memory_cache_alloc_obj(mc, GFP_KERNEL_ACCOUNT);
-   if (!obj)
-   return mc->nobjs >= min ? 0 : -ENOMEM;
-   mc->objects[mc->nobjs++] = obj;
-   }
-   return 0;
-}
-
-static int kvm_mmu_memory_cache_nr_free_objects(struct kvm_mmu_memory_cache 
*mc)
-{
-   return mc->nobjs;
-}
-
-static void kvm_mmu_free_memory_cache(struct kvm_mmu_memory_cache *mc)
-{
-   while (mc->nobjs) {
-   if (mc->kmem_cache)
-   kmem_cache_free(mc->kmem_cache, 
mc->objects[--mc->nobjs]);
-   else
-   free_page((unsigned long)mc->objects[--mc->nobjs]);
-   }
-}
-
 static int mmu_topup_memory_caches(struct kvm_vcpu *vcpu, bool maybe_indirect)
 {
int r;
@@ -1132,18 +1091,6 @@ static void mmu_free_memory_caches(struct kvm_vcpu *vcpu)
kvm_mmu_free_memory_cache(>arch.mmu_page_header_cache);
 }
 
-static void *kvm_mmu_memory_cache_alloc(struct kvm_mmu_memory_cache *mc)
-{
-   void *p;
-
-   if (WARN_ON(!mc->nobjs))
-   p = mmu_memory_cache_alloc_obj(mc, GFP_ATOMIC | __GFP_ACCOUNT);
-   else
-   p = mc->objects[--mc->nobjs];
-   BUG_ON(!p);
-   return p;
-}
-
 static struct pte_list_desc *mmu_alloc_pte_list_desc(struct kvm_vcpu *vcpu)
 {
return kvm_mmu_memory_cache_alloc(>arch.mmu_pte_list_desc_cache);
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 62ec926c78a0..d35e397dad6a 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -816,6 +816,13 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *vcpu, bool 
usermode_vcpu_not_eligible);
 void kvm_flush_remote_tlbs(struct kvm *kvm);
 void kvm_reload_remote_mmus(struct kvm *kvm);
 
+#ifdef KVM_ARCH_NR_OBJS_PER_MEMORY_CACHE
+int kvm_mmu_topup_memory_cache(struct kvm_mmu_memory_cache *mc, int min);
+int kvm_mmu_memory_cache_nr_free_objects(struct kvm_mmu_memory_cache *mc);
+void kvm_mmu_free_memory_cache(struct kvm_mmu_memory_cache *mc);
+void *kvm_mmu_memory_cache_alloc(struct kvm_mmu_memory_cache *mc);
+#endif
+
 bool kvm_make_vcpus_request_mask(struct kvm *kvm, unsigned int req,
 struct kvm_vcpu *except,
 unsigned long *vcpu_bitmap, cpumask_var_t tmp);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 7b6013f2ba19..9f019b552dcf 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -341,6 +341,61 @@ void kvm_reload_remote_mmus(struct kvm *kvm)
kvm_make_all_cpus_request(kvm, KVM_REQ_MMU_RELOAD);
 }
 
+#ifdef KVM_ARCH_NR_OBJS_PER_MEMORY_CACHE
+static inline void *mmu_memory_cache_alloc_obj(struct kvm_mmu_memory_cache *mc,
+  gfp_t gfp_flags)
+{
+   gfp_flags |= mc->gfp_zero;
+
+   if (mc->kmem_cache)
+   return kmem_cache_alloc(mc->kmem_cache, gfp_flags);
+   else
+   return (void *)__get_free_page(gfp_flags);
+}
+
+int kvm_mmu_topup_memory_cache(struct kvm_mmu_memory_cache *mc, int min)
+{
+   void *obj;
+
+   if (mc->nobjs >= min)
+   return 0;
+   while (mc->nobjs < ARRAY_SIZE(mc->objects)) {
+   obj = mmu_memory_cache_alloc_obj(mc, GFP_KERNEL_ACCOUNT);
+   if (!obj)
+   return mc->nobjs >= min ? 0 : -ENOMEM;
+   mc->objects[mc->nobjs++] = obj;
+   }
+   return 0;
+}
+
+int kvm_mmu_memory_cache_nr_free_objects(struct kvm_mmu_memory_cache *mc)
+{
+   return mc->nobjs;
+}
+
+void kvm_mmu_free_memory_cache(struct kvm_mmu_memory_cache *mc)
+{
+   while (mc->nobjs) {
+   if (mc->kmem_cache)
+

[RFC v5 10/10] drm/nouveau/kms/nvd9-: Add CRC support

2020-06-22 Thread Lyude Paul

This introduces support for CRC readback on gf119+, using the
documentation generously provided to us by Nvidia:

https://github.com/NVIDIA/open-gpu-doc/blob/master/Display-CRC/display-crc.txt

We expose all available CRC sources. SF, SOR, PIOR, and DAC are exposed
through a single set of "outp" sources: outp-active/auto for a CRC of
the scanout region, outp-complete for a CRC of both the scanout and
blanking/sync region combined, and outp-inactive for a CRC of only the
blanking/sync region. For each source, nouveau selects the appropriate
tap point based on the output path in use. We also expose an "rg"
source, which allows for capturing CRCs of the scanout raster before
it's encoded into a video signal in the output path. This tap point is
referred to as the raster generator.

Note that while there's some other neat features that can be used with
CRC capture on nvidia hardware, like capturing from two CRC sources
simultaneously, I couldn't see any usecase for them and did not
implement them.

Nvidia only allows for accessing CRCs through a shared DMA region that
we program through the core EVO/NvDisplay channel which is referred to
as the notifier context. The notifier context is limited to either 255
(for Fermi-Pascal) or 2047 (Volta+) entries to store CRCs in, and
unfortunately the hardware simply drops CRCs and reports an overflow
once all available entries in the notifier context are filled.

Since the DRM CRC API and igt-gpu-tools don't expect there to be a limit
on how many CRCs can be captured, we work around this in nouveau by
allocating two separate notifier contexts for each head instead of one.
We schedule a vblank worker ahead of time so that once we start getting
close to filling up all of the available entries in the notifier
context, we can swap the currently used notifier context out with
another pre-prepared notifier context in a manner similar to page
flipping.

Unfortunately, the hardware only allows us to this by flushing two
separate updates on the core channel: one to release the current
notifier context handle, and one to program the next notifier context's
handle. When the hardware processes the first update, the CRC for the
current frame is lost. However, the second update can be flushed
immediately without waiting for the first to complete so that CRC
generation resumes on the next frame. According to Nvidia's hardware
engineers, there isn't any cleaner way of flipping notifier contexts
that would avoid this.

Since using vblank workers to swap out the notifier context will ensure
we can usually flush both updates to hardware within the timespan of a
single frame, we can also ensure that there will only be exactly one
frame lost between the first and second update being executed by the
hardware. This gives us the guarantee that we're always correctly
matching each CRC entry with it's respective frame even after a context
flip. And since IGT will retrieve the CRC entry for a frame by waiting
until it receives a CRC for any subsequent frames, this doesn't cause an
issue with any tests and is much simpler than trying to change the
current DRM API to accommodate.

In order to facilitate testing of correct handling of this limitation,
we also expose a debugfs interface to manually control the threshold for
when we start trying to flip the notifier context. We will use this in
igt to trigger a context flip for testing purposes without needing to
wait for the notifier to completely fill up. This threshold is reset
to the default value set by nouveau after each capture, and is exposed
in a separate folder within each CRTC's debugfs directory labelled
"nv_crc".

Changes since v1:
* Forgot to finish saving crc.h before saving, whoops. This just adds
  some corrections to the empty function declarations that we use if
  CONFIG_DEBUG_FS isn't enabled.
Changes since v2:
* Don't check return code from debugfs_create_dir() or
  debugfs_create_file() - Greg K-H
Changes since v3:
  (no functional changes)
* Fix SPDX license identifiers (checkpatch)
* s/uint32_t/u32/ (checkpatch)
* Fix indenting in switch cases (checkpatch)
Changes since v4:
* Remove unneeded param changes with nv50_head_flush_clr/set
* Rebase

Signed-off-by: Lyude Paul 
---
 drivers/gpu/drm/nouveau/dispnv04/crtc.c |  25 +-
 drivers/gpu/drm/nouveau/dispnv50/Kbuild |   4 +
 drivers/gpu/drm/nouveau/dispnv50/atom.h |  20 +
 drivers/gpu/drm/nouveau/dispnv50/core.h |   4 +
 drivers/gpu/drm/nouveau/dispnv50/core907d.c |   3 +
 drivers/gpu/drm/nouveau/dispnv50/core917d.c |   3 +
 drivers/gpu/drm/nouveau/dispnv50/corec37d.c |   3 +
 drivers/gpu/drm/nouveau/dispnv50/corec57d.c |   3 +
 drivers/gpu/drm/nouveau/dispnv50/crc.c  | 714 
 drivers/gpu/drm/nouveau/dispnv50/crc.h  | 125 
 drivers/gpu/drm/nouveau/dispnv50/crc907d.c  | 139 
 drivers/gpu/drm/nouveau/dispnv50/crcc37d.c  | 153 +
 drivers/gpu/drm/nouveau/dispnv50/disp.c |  17 +
 drivers/gpu/drm/nouveau/dispnv50/disp.h |

[PATCH v2 16/21] KVM: arm64: Drop @max param from mmu_topup_memory_cache()

2020-06-22 Thread Sean Christopherson

Replace the @max param in mmu_topup_memory_cache() and instead use
ARRAY_SIZE() to terminate the loop to fill the cache.  This removes a
BUG_ON() and sets the stage for moving arm64 to the common memory cache
implementation.

No functional change intended.

Tested-by: Marc Zyngier 
Signed-off-by: Sean Christopherson 
---
 arch/arm64/kvm/mmu.c | 12 
 1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index a1f6bc70c4e4..9398b66f8a87 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -124,15 +124,13 @@ static void stage2_dissolve_pud(struct kvm *kvm, 
phys_addr_t addr, pud_t *pudp)
put_page(virt_to_page(pudp));
 }
 
-static int mmu_topup_memory_cache(struct kvm_mmu_memory_cache *cache,
- int min, int max)
+static int mmu_topup_memory_cache(struct kvm_mmu_memory_cache *cache, int min)
 {
void *page;
 
-   BUG_ON(max > KVM_NR_MEM_OBJS);
if (cache->nobjs >= min)
return 0;
-   while (cache->nobjs < max) {
+   while (cache->nobjs < ARRAY_SIZE(cache->objects)) {
page = (void *)__get_free_page(GFP_PGTABLE_USER);
if (!page)
return -ENOMEM;
@@ -1356,8 +1354,7 @@ int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t 
guest_ipa,
pte = kvm_s2pte_mkwrite(pte);
 
ret = mmu_topup_memory_cache(,
-kvm_mmu_cache_min_pages(kvm),
-KVM_NR_MEM_OBJS);
+kvm_mmu_cache_min_pages(kvm));
if (ret)
goto out;
spin_lock(>mmu_lock);
@@ -1737,8 +1734,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, 
phys_addr_t fault_ipa,
up_read(>mm->mmap_sem);
 
/* We need minimum second+third level pages */
-   ret = mmu_topup_memory_cache(memcache, kvm_mmu_cache_min_pages(kvm),
-KVM_NR_MEM_OBJS);
+   ret = mmu_topup_memory_cache(memcache, kvm_mmu_cache_min_pages(kvm));
if (ret)
return ret;
 
-- 
2.26.0

[PATCH v2 18/21] KVM: arm64: Use common KVM implementation of MMU memory caches

2020-06-22 Thread Sean Christopherson

Move to the common MMU memory cache implementation now that the common
code and arm64's existing code are semantically compatible.

No functional change intended.

Suggested-by: Christoffer Dall 
Tested-by: Marc Zyngier 
Signed-off-by: Sean Christopherson 
---
 arch/arm64/include/asm/Kbuild  |  1 -
 arch/arm64/include/asm/kvm_host.h  | 12 ---
 arch/arm64/include/asm/kvm_types.h |  8 +
 arch/arm64/kvm/mmu.c   | 51 ++
 4 files changed, 18 insertions(+), 54 deletions(-)
 create mode 100644 arch/arm64/include/asm/kvm_types.h

diff --git a/arch/arm64/include/asm/Kbuild b/arch/arm64/include/asm/Kbuild
index 35a68155cd0e..ff9cbb631212 100644
--- a/arch/arm64/include/asm/Kbuild
+++ b/arch/arm64/include/asm/Kbuild
@@ -1,6 +1,5 @@
 # SPDX-License-Identifier: GPL-2.0
 generic-y += early_ioremap.h
-generic-y += kvm_types.h
 generic-y += local64.h
 generic-y += mcs_spinlock.h
 generic-y += qrwlock.h
diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h
index 335170b59899..23d1f41548f5 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -97,18 +97,6 @@ struct kvm_arch {
bool return_nisv_io_abort_to_user;
 };
 
-#define KVM_NR_MEM_OBJS 40
-
-/*
- * We don't want allocation failures within the mmu code, so we preallocate
- * enough memory for a single page fault in a cache.
- */
-struct kvm_mmu_memory_cache {
-   int nobjs;
-   gfp_t gfp_zero;
-   void *objects[KVM_NR_MEM_OBJS];
-};
-
 struct kvm_vcpu_fault_info {
u32 esr_el2;/* Hyp Syndrom Register */
u64 far_el2;/* Hyp Fault Address Register */
diff --git a/arch/arm64/include/asm/kvm_types.h 
b/arch/arm64/include/asm/kvm_types.h
new file mode 100644
index ..9a126b9e2d7c
--- /dev/null
+++ b/arch/arm64/include/asm/kvm_types.h
@@ -0,0 +1,8 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_ARM64_KVM_TYPES_H
+#define _ASM_ARM64_KVM_TYPES_H
+
+#define KVM_ARCH_NR_OBJS_PER_MEMORY_CACHE 40
+
+#endif /* _ASM_ARM64_KVM_TYPES_H */
+
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 688213ef34f0..976405e2fbb2 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -124,37 +124,6 @@ static void stage2_dissolve_pud(struct kvm *kvm, 
phys_addr_t addr, pud_t *pudp)
put_page(virt_to_page(pudp));
 }
 
-static int mmu_topup_memory_cache(struct kvm_mmu_memory_cache *cache, int min)
-{
-   void *page;
-
-   if (cache->nobjs >= min)
-   return 0;
-   while (cache->nobjs < ARRAY_SIZE(cache->objects)) {
-   page = (void *)__get_free_page(GFP_KERNEL_ACCOUNT |
-  cache->gfp_zero);
-   if (!page)
-   return -ENOMEM;
-   cache->objects[cache->nobjs++] = page;
-   }
-   return 0;
-}
-
-static void mmu_free_memory_cache(struct kvm_mmu_memory_cache *mc)
-{
-   while (mc->nobjs)
-   free_page((unsigned long)mc->objects[--mc->nobjs]);
-}
-
-static void *mmu_memory_cache_alloc(struct kvm_mmu_memory_cache *mc)
-{
-   void *p;
-
-   BUG_ON(!mc || !mc->nobjs);
-   p = mc->objects[--mc->nobjs];
-   return p;
-}
-
 static void clear_stage2_pgd_entry(struct kvm *kvm, pgd_t *pgd, phys_addr_t 
addr)
 {
pud_t *pud_table __maybe_unused = stage2_pud_offset(kvm, pgd, 0UL);
@@ -1024,7 +993,7 @@ static pud_t *stage2_get_pud(struct kvm *kvm, struct 
kvm_mmu_memory_cache *cache
if (stage2_pgd_none(kvm, *pgd)) {
if (!cache)
return NULL;
-   pud = mmu_memory_cache_alloc(cache);
+   pud = kvm_mmu_memory_cache_alloc(cache);
stage2_pgd_populate(kvm, pgd, pud);
get_page(virt_to_page(pgd));
}
@@ -1045,7 +1014,7 @@ static pmd_t *stage2_get_pmd(struct kvm *kvm, struct 
kvm_mmu_memory_cache *cache
if (stage2_pud_none(kvm, *pud)) {
if (!cache)
return NULL;
-   pmd = mmu_memory_cache_alloc(cache);
+   pmd = kvm_mmu_memory_cache_alloc(cache);
stage2_pud_populate(kvm, pud, pmd);
get_page(virt_to_page(pud));
}
@@ -1251,7 +1220,7 @@ static int stage2_set_pte(struct kvm *kvm, struct 
kvm_mmu_memory_cache *cache,
if (stage2_pud_none(kvm, *pud)) {
if (!cache)
return 0; /* ignore calls from kvm_set_spte_hva */
-   pmd = mmu_memory_cache_alloc(cache);
+   pmd = kvm_mmu_memory_cache_alloc(cache);
stage2_pud_populate(kvm, pud, pmd);
get_page(virt_to_page(pud));
}
@@ -1276,7 +1245,7 @@ static int stage2_set_pte(struct kvm *kvm, struct 
kvm_mmu_memory_cache *cache,
if (pmd_none(*pmd)) {
if (!cache)
return 0; /* ignore calls from kvm_set_spte_hva */
-   pte =

[PATCH v2 17/21] KVM: arm64: Use common code's approach for __GFP_ZERO with memory caches

2020-06-22 Thread Sean Christopherson

Add a "gfp_zero" member to arm64's 'struct kvm_mmu_memory_cache' to make
the struct and its usage compatible with the common 'struct
kvm_mmu_memory_cache' in linux/kvm_host.h.  This will minimize code
churn when arm64 moves to the common implementation in a future patch, at
the cost of temporarily having somewhat silly code.

Note, GFP_PGTABLE_USER is equivalent to GFP_KERNEL_ACCOUNT | GFP_ZERO:

  #define GFP_PGTABLE_USER  (GFP_PGTABLE_KERNEL | __GFP_ACCOUNT)
  |
  -> #define GFP_PGTABLE_KERNEL(GFP_KERNEL | __GFP_ZERO)

  == GFP_KERNEL | __GFP_ACCOUNT | __GFP_ZERO

versus

  #define GFP_KERNEL_ACCOUNT (GFP_KERNEL | __GFP_ACCOUNT)

with __GFP_ZERO explicitly OR'd in

  == GFP_KERNEL | __GFP_ACCOUNT | __GFP_ZERO

No functional change intended.

Tested-by: Marc Zyngier 
Signed-off-by: Sean Christopherson 
---
 arch/arm64/include/asm/kvm_host.h | 1 +
 arch/arm64/kvm/arm.c  | 2 ++
 arch/arm64/kvm/mmu.c  | 5 +++--
 3 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h
index c3e6fcc664b1..335170b59899 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -105,6 +105,7 @@ struct kvm_arch {
  */
 struct kvm_mmu_memory_cache {
int nobjs;
+   gfp_t gfp_zero;
void *objects[KVM_NR_MEM_OBJS];
 };
 
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 90cb90561446..1016635b3782 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -270,6 +270,8 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
vcpu->arch.target = -1;
bitmap_zero(vcpu->arch.features, KVM_VCPU_MAX_FEATURES);
 
+   vcpu->arch.mmu_page_cache.gfp_zero = __GFP_ZERO;
+
/* Set up the timer */
kvm_timer_vcpu_init(vcpu);
 
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 9398b66f8a87..688213ef34f0 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -131,7 +131,8 @@ static int mmu_topup_memory_cache(struct 
kvm_mmu_memory_cache *cache, int min)
if (cache->nobjs >= min)
return 0;
while (cache->nobjs < ARRAY_SIZE(cache->objects)) {
-   page = (void *)__get_free_page(GFP_PGTABLE_USER);
+   page = (void *)__get_free_page(GFP_KERNEL_ACCOUNT |
+  cache->gfp_zero);
if (!page)
return -ENOMEM;
cache->objects[cache->nobjs++] = page;
@@ -1342,7 +1343,7 @@ int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t 
guest_ipa,
phys_addr_t addr, end;
int ret = 0;
unsigned long pfn;
-   struct kvm_mmu_memory_cache cache = { 0, };
+   struct kvm_mmu_memory_cache cache = { 0, __GFP_ZERO, };
 
end = (guest_ipa + size + PAGE_SIZE - 1) & PAGE_MASK;
pfn = __phys_to_pfn(pa);
-- 
2.26.0

[PATCH v2 20/21] KVM: MIPS: Account pages used for GPA page tables

2020-06-22 Thread Sean Christopherson

Use GFP_KERNEL_ACCOUNT instead of GFP_KERNEL when allocating pages for
the the GPA page tables.  The primary motivation for accounting the
allocations is to align with the common KVM memory cache helpers in
preparation for moving to the common implementation in a future patch.
The actual accounting is a bonus side effect.

Signed-off-by: Sean Christopherson 
---
 arch/mips/kvm/mmu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/mips/kvm/mmu.c b/arch/mips/kvm/mmu.c
index 94562c54b930..41a4a063a730 100644
--- a/arch/mips/kvm/mmu.c
+++ b/arch/mips/kvm/mmu.c
@@ -32,7 +32,7 @@ static int mmu_topup_memory_cache(struct kvm_mmu_memory_cache 
*cache, int min)
if (cache->nobjs >= min)
return 0;
while (cache->nobjs < ARRAY_SIZE(cache->objects)) {
-   page = (void *)__get_free_page(GFP_KERNEL);
+   page = (void *)__get_free_page(GFP_KERNEL_ACCOUNT);
if (!page)
return -ENOMEM;
cache->objects[cache->nobjs++] = page;
-- 
2.26.0

[PATCH v2 14/21] KVM: Move x86's version of struct kvm_mmu_memory_cache to common code

2020-06-22 Thread Sean Christopherson

Move x86's 'struct kvm_mmu_memory_cache' to common code in anticipation
of moving the entire x86 implementation code to common KVM and reusing
it for arm64 and MIPS.  Add a new architecture specific asm/kvm_types.h
to control the existence and parameters of the struct.  The new header
is needed to avoid a chicken-and-egg problem with asm/kvm_host.h as all
architectures define instances of the struct in their vCPU structs.

Add an asm-generic version of kvm_types.h to avoid having empty files on
PPC and s390 in the long term, and for arm64 and mips in the short term.

Suggested-by: Christoffer Dall 
Signed-off-by: Sean Christopherson 
---
 arch/arm64/include/asm/Kbuild|  1 +
 arch/mips/include/asm/Kbuild |  1 +
 arch/powerpc/include/asm/Kbuild  |  1 +
 arch/s390/include/asm/Kbuild |  1 +
 arch/x86/include/asm/kvm_host.h  | 13 -
 arch/x86/include/asm/kvm_types.h |  7 +++
 include/asm-generic/kvm_types.h  |  5 +
 include/linux/kvm_types.h| 19 +++
 8 files changed, 35 insertions(+), 13 deletions(-)
 create mode 100644 arch/x86/include/asm/kvm_types.h
 create mode 100644 include/asm-generic/kvm_types.h

diff --git a/arch/arm64/include/asm/Kbuild b/arch/arm64/include/asm/Kbuild
index ff9cbb631212..35a68155cd0e 100644
--- a/arch/arm64/include/asm/Kbuild
+++ b/arch/arm64/include/asm/Kbuild
@@ -1,5 +1,6 @@
 # SPDX-License-Identifier: GPL-2.0
 generic-y += early_ioremap.h
+generic-y += kvm_types.h
 generic-y += local64.h
 generic-y += mcs_spinlock.h
 generic-y += qrwlock.h
diff --git a/arch/mips/include/asm/Kbuild b/arch/mips/include/asm/Kbuild
index 8643d313890e..397e6d24d2ab 100644
--- a/arch/mips/include/asm/Kbuild
+++ b/arch/mips/include/asm/Kbuild
@@ -5,6 +5,7 @@ generated-y += syscall_table_64_n32.h
 generated-y += syscall_table_64_n64.h
 generated-y += syscall_table_64_o32.h
 generic-y += export.h
+generic-y += kvm_types.h
 generic-y += local64.h
 generic-y += mcs_spinlock.h
 generic-y += parport.h
diff --git a/arch/powerpc/include/asm/Kbuild b/arch/powerpc/include/asm/Kbuild
index dadbcf3a0b1e..2d444d09b553 100644
--- a/arch/powerpc/include/asm/Kbuild
+++ b/arch/powerpc/include/asm/Kbuild
@@ -4,6 +4,7 @@ generated-y += syscall_table_64.h
 generated-y += syscall_table_c32.h
 generated-y += syscall_table_spu.h
 generic-y += export.h
+generic-y += kvm_types.h
 generic-y += local64.h
 generic-y += mcs_spinlock.h
 generic-y += vtime.h
diff --git a/arch/s390/include/asm/Kbuild b/arch/s390/include/asm/Kbuild
index 83f6e85de7bc..319efa0e6d02 100644
--- a/arch/s390/include/asm/Kbuild
+++ b/arch/s390/include/asm/Kbuild
@@ -6,5 +6,6 @@ generated-y += unistd_nr.h
 
 generic-y += asm-offsets.h
 generic-y += export.h
+generic-y += kvm_types.h
 generic-y += local64.h
 generic-y += mcs_spinlock.h
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 67b84aa2984e..70832aa762e5 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -193,8 +193,6 @@ struct x86_exception;
 enum x86_intercept;
 enum x86_intercept_stage;
 
-#define KVM_NR_MEM_OBJS 40
-
 #define KVM_NR_DB_REGS 4
 
 #define DR6_BD (1 << 13)
@@ -245,17 +243,6 @@ enum x86_intercept_stage;
 
 struct kvm_kernel_irq_routing_entry;
 
-/*
- * We don't want allocation failures within the mmu code, so we preallocate
- * enough memory for a single page fault in a cache.
- */
-struct kvm_mmu_memory_cache {
-   int nobjs;
-   gfp_t gfp_zero;
-   struct kmem_cache *kmem_cache;
-   void *objects[KVM_NR_MEM_OBJS];
-};
-
 /*
  * the pages used as guest page table on soft mmu are tracked by
  * kvm_memory_slot.arch.gfn_track which is 16 bits, so the role bits used
diff --git a/arch/x86/include/asm/kvm_types.h b/arch/x86/include/asm/kvm_types.h
new file mode 100644
index ..08f1b57d3b62
--- /dev/null
+++ b/arch/x86/include/asm/kvm_types.h
@@ -0,0 +1,7 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_X86_KVM_TYPES_H
+#define _ASM_X86_KVM_TYPES_H
+
+#define KVM_ARCH_NR_OBJS_PER_MEMORY_CACHE 40
+
+#endif /* _ASM_X86_KVM_TYPES_H */
diff --git a/include/asm-generic/kvm_types.h b/include/asm-generic/kvm_types.h
new file mode 100644
index ..2a82daf110f1
--- /dev/null
+++ b/include/asm-generic/kvm_types.h
@@ -0,0 +1,5 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_GENERIC_KVM_TYPES_H
+#define _ASM_GENERIC_KVM_TYPES_H
+
+#endif
diff --git a/include/linux/kvm_types.h b/include/linux/kvm_types.h
index 68e84cf42a3f..a7580f69dda0 100644
--- a/include/linux/kvm_types.h
+++ b/include/linux/kvm_types.h
@@ -20,6 +20,8 @@ enum kvm_mr_change;
 
 #include 
 
+#include 
+
 /*
  * Address types:
  *
@@ -58,4 +60,21 @@ struct gfn_to_pfn_cache {
bool dirty;
 };
 
+#ifdef KVM_ARCH_NR_OBJS_PER_MEMORY_CACHE
+/*
+ * Memory caches are used to preallocate memory ahead of various MMU flows,
+ * e.g. page fault handlers.  Gracefully handling allocation failures deep in
+ * MMU flows is problematic, as is triggering reclaim,

[PATCH v2 09/21] KVM: x86/mmu: Separate the memory caches for shadow pages and gfn arrays

2020-06-22 Thread Sean Christopherson

Use separate caches for allocating shadow pages versus gfn arrays.  This
sets the stage for specifying __GFP_ZERO when allocating shadow pages
without incurring extra cost for gfn arrays.

No functional change intended.

Reviewed-by: Ben Gardon 
Signed-off-by: Sean Christopherson 
---
 arch/x86/include/asm/kvm_host.h |  3 ++-
 arch/x86/kvm/mmu/mmu.c  | 15 ++-
 2 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 7b6ac8fad9c2..376e1653ac41 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -636,7 +636,8 @@ struct kvm_vcpu_arch {
struct kvm_mmu *walk_mmu;
 
struct kvm_mmu_memory_cache mmu_pte_list_desc_cache;
-   struct kvm_mmu_memory_cache mmu_page_cache;
+   struct kvm_mmu_memory_cache mmu_shadow_page_cache;
+   struct kvm_mmu_memory_cache mmu_gfn_array_cache;
struct kvm_mmu_memory_cache mmu_page_header_cache;
 
/*
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 451e0365e5dd..d245acece3cd 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -1108,8 +1108,12 @@ static int mmu_topup_memory_caches(struct kvm_vcpu *vcpu)
   1 + PT64_ROOT_MAX_LEVEL + PTE_PREFETCH_NUM);
if (r)
return r;
-   r = mmu_topup_memory_cache(>arch.mmu_page_cache,
-  2 * PT64_ROOT_MAX_LEVEL);
+   r = mmu_topup_memory_cache(>arch.mmu_shadow_page_cache,
+  PT64_ROOT_MAX_LEVEL);
+   if (r)
+   return r;
+   r = mmu_topup_memory_cache(>arch.mmu_gfn_array_cache,
+  PT64_ROOT_MAX_LEVEL);
if (r)
return r;
return mmu_topup_memory_cache(>arch.mmu_page_header_cache,
@@ -1119,7 +1123,8 @@ static int mmu_topup_memory_caches(struct kvm_vcpu *vcpu)
 static void mmu_free_memory_caches(struct kvm_vcpu *vcpu)
 {
mmu_free_memory_cache(>arch.mmu_pte_list_desc_cache);
-   mmu_free_memory_cache(>arch.mmu_page_cache);
+   mmu_free_memory_cache(>arch.mmu_shadow_page_cache);
+   mmu_free_memory_cache(>arch.mmu_gfn_array_cache);
mmu_free_memory_cache(>arch.mmu_page_header_cache);
 }
 
@@ -2096,9 +2101,9 @@ static struct kvm_mmu_page *kvm_mmu_alloc_page(struct 
kvm_vcpu *vcpu, int direct
struct kvm_mmu_page *sp;
 
sp = mmu_memory_cache_alloc(>arch.mmu_page_header_cache);
-   sp->spt = mmu_memory_cache_alloc(>arch.mmu_page_cache);
+   sp->spt = mmu_memory_cache_alloc(>arch.mmu_shadow_page_cache);
if (!direct)
-   sp->gfns = mmu_memory_cache_alloc(>arch.mmu_page_cache);
+   sp->gfns = 
mmu_memory_cache_alloc(>arch.mmu_gfn_array_cache);
set_page_private(virt_to_page(sp->spt), (unsigned long)sp);
 
/*
-- 
2.26.0

Re: [PATCH v4 3/5] stack: Optionally randomize kernel stack offset each syscall

2020-06-22 Thread Jann Horn

On Mon, Jun 22, 2020 at 9:31 PM Kees Cook  wrote:
> This provides the ability for architectures to enable kernel stack base
> address offset randomization. This feature is controlled by the boot
> param "randomize_kstack_offset=on/off", with its default value set by
> CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT.
[...]
> +#define add_random_kstack_offset() do {  
>   \
> +   if (static_branch_maybe(CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT, \
> +   _kstack_offset)) {\
> +   u32 offset = this_cpu_read(kstack_offset);  \
> +   u8 *ptr = __builtin_alloca(offset & 0x3FF); \
> +   asm volatile("" : "=m"(*ptr));  \
> +   }   \
> +} while (0)

clang generates better code here if the mask is stack-aligned -
otherwise it needs to round the stack pointer / the offset:

$ cat alloca_align.c
#include 
void callee(void);

void alloca_blah(unsigned long rand) {
  asm volatile(""::"r"(alloca(rand & MASK)));
  callee();
}
$ clang -O3 -c -o alloca_align.o alloca_align.c -DMASK=0x3ff
$ objdump -d alloca_align.o
[...]
   0: 55push   %rbp
   1: 48 89 e5  mov%rsp,%rbp
   4: 81 e7 ff 03 00 00and$0x3ff,%edi
   a: 83 c7 0f  add$0xf,%edi
   d: 83 e7 f0  and$0xfff0,%edi
  10: 48 89 e0  mov%rsp,%rax
  13: 48 29 f8  sub%rdi,%rax
  16: 48 89 c4  mov%rax,%rsp
  19: e8 00 00 00 00callq  1e 
  1e: 48 89 ec  mov%rbp,%rsp
  21: 5dpop%rbp
  22: c3retq
$ clang -O3 -c -o alloca_align.o alloca_align.c -DMASK=0x3f0
$ objdump -d alloca_align.o
[...]
   0: 55push   %rbp
   1: 48 89 e5  mov%rsp,%rbp
   4: 48 89 e0  mov%rsp,%rax
   7: 81 e7 f0 03 00 00and$0x3f0,%edi
   d: 48 29 f8  sub%rdi,%rax
  10: 48 89 c4  mov%rax,%rsp
  13: e8 00 00 00 00callq  18 
  18: 48 89 ec  mov%rbp,%rsp
  1b: 5dpop%rbp
  1c: c3retq
$

(From a glance at the assembly, gcc seems to always assume that the
length may be misaligned.)

Maybe this should be something along the lines of
__builtin_alloca(offset & (0x3ff & ARCH_STACK_ALIGN_MASK)) (with
appropriate definitions of the stack alignment mask depending on the
architecture's choice of stack alignment for kernel code).

< 2 3 4 5 6 7 8 9 10 11 >

601 - 700 of 1749 matches

Mail list logo