Re: [RFC][PATCH] fs: set xattrs in initramfs from regular files

2018-11-23 Thread Casey Schaufler
On 11/22/2018 7:49 AM, Roberto Sassu wrote:
> Although rootfs (tmpfs) supports xattrs, they are not set due to the
> limitation of the cpio format. A new format called 'newcx' was proposed to
> overcome this limitation.
>
> However, it looks like that adding a new format is not simple: 15 kernel
> patches; user space tools must support the new format; mistakes made in the
> past should be avoided; it is unclear whether the kernel should switch from
> cpio to tar.
>
> The aim of this patch is to provide the same functionality without
> introducing a new format. The value of xattrs is placed in regular files
> having the same file name as the files xattrs are added to, plus a
> separator and the xattr name (.xattr-).
>
> Example:
>
> '/bin/cat.xattr-security.ima' is the name of a file containing the value of
> the security.ima xattr to be added to /bin/cat.
>
> At kernel initialization time, the kernel iterates over the rootfs
> filesystem, and if it encounters files with the '.xattr-' separator, it
> reads the content and adds the xattr to the file without the suffix.

No.

Really, no.

It would be incredibly easy to use this mechanism to break
into systems.
 

> This proposal requires that LSMs and IMA allow the read and setxattr
> operations. This should not be a concern since: files with xattr values
> are not parsed by the kernel; user space processes are not yet executed.
>
> It would be possible to include all xattrs in the same file, but this
> increases the risk of the kernel being compromised by parsing the content.

The kernel mustn't do this.

>
> Signed-off-by: Roberto Sassu 
> ---
>  fs/Makefile|   2 +-
>  fs/readxattr.c | 171 +
>  include/linux/fs.h |   2 +
>  init/main.c|   1 +
>  4 files changed, 175 insertions(+), 1 deletion(-)
>  create mode 100644 fs/readxattr.c
>
> diff --git a/fs/Makefile b/fs/Makefile
> index 293733f61594..738e1a4e4aff 100644
> --- a/fs/Makefile
> +++ b/fs/Makefile
> @@ -12,7 +12,7 @@ obj-y :=open.o read_write.o file_table.o super.o \
>   attr.o bad_inode.o file.o filesystems.o namespace.o \
>   seq_file.o xattr.o libfs.o fs-writeback.o \
>   pnode.o splice.o sync.o utimes.o d_path.o \
> - stack.o fs_struct.o statfs.o fs_pin.o nsfs.o
> + stack.o fs_struct.o statfs.o fs_pin.o nsfs.o readxattr.o
>  
>  ifeq ($(CONFIG_BLOCK),y)
>  obj-y += buffer.o block_dev.o direct-io.o mpage.o
> diff --git a/fs/readxattr.c b/fs/readxattr.c
> new file mode 100644
> index ..01838f6f1e92
> --- /dev/null
> +++ b/fs/readxattr.c
> @@ -0,0 +1,171 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright (C) 2018 Huawei Technologies Duesseldorf GmbH
> + *
> + * Author: Roberto Sassu 
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License as
> + * published by the Free Software Foundation, version 2 of the
> + * License.
> + *
> + * File: readxattr.c
> + *  Read extended attributes from regular files in the initial ram disk
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include "internal.h"
> +
> +#define SETXATTR_FILENAME ".setxattr"
> +#define FILENAME_XATTR_SEP ".xattr-"
> +
> +
> +struct readdir_callback {
> + struct dir_context ctx;
> + struct path *path;
> +};
> +
> +LIST_HEAD(dir_list);
> +
> +struct dir_path {
> + struct list_head next;
> + struct path path;
> +};
> +
> +static int __init read_set_xattr(struct dir_context *__ctx, const char *name,
> +  int namelen, loff_t offset, u64 ino,
> +  unsigned int d_type)
> +{
> + struct readdir_callback *ctx = container_of(__ctx, typeof(*ctx), ctx);
> + struct path *dir = ctx->path, source_path, target_path;
> + char filename[NAME_MAX + 1], *xattrname, *separator;
> + struct dir_path *subdir;
> + struct file *file;
> + void *datap;
> + loff_t size;
> + int result;
> +
> + if (!strcmp(name, ".") || !strcmp(name, ".."))
> + return 0;
> +
> + result = vfs_path_lookup(dir->dentry, dir->mnt, name, 0, &source_path);
> + if (result)
> + return 0;
> +
> + size = i_size_read(source_path.dentry->d_inode);
> + if (size > XATTR_SIZE_MAX)
> + goto out;
> +
> + if (source_path.dentry->d_inode->i_sb != dir->dentry->d_inode->i_sb)
> + goto out;
> +
> + if (!S_ISREG(source_path.dentry->d_inode->i_mode) &&
> + !S_ISDIR(source_path.dentry->d_inode->i_mode))
> + goto out;
> +
> + if (S_ISREG(source_path.dentry->d_inode->i_mode)) {
> + separator = strstr(name, FILENAME_XATTR_SEP);
> + if (!separator)
> + goto out;
> +
> + xattrname = separator + sizeof(FILENAME_XATTR_SEP) - 1;
> + if (strlen(xattrname

Re: [GIT PULL] Power management fixes for v4.20-rc4

2018-11-23 Thread pr-tracker-bot
The pull request you sent on Fri, 23 Nov 2018 11:10:58 +0100:

> git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git pm-4.20-rc4

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/b88af994872418f0a98db6f4a9bae849315a99b0

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.wiki.kernel.org/userdoc/prtracker


Re: [GIT PULL] ACPI fix for v4.20-rc4

2018-11-23 Thread pr-tracker-bot
The pull request you sent on Fri, 23 Nov 2018 11:12:02 +0100:

> git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git 
> acpi-4.20-rc4

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/a03bac580ae743d5900af626ac63f7f8cd85def9

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.wiki.kernel.org/userdoc/prtracker


Re: [PATCH] dmaengine: ti: omap-dma: Configure LCH_TYPE for OMAP1

2018-11-23 Thread Aaro Koskinen
Hi,

On Fri, Nov 23, 2018 at 02:35:04PM +0200, Peter Ujfalusi wrote:
> On 22/11/2018 17.12, Russell King - ARM Linux wrote:
> > I'm also not sure about this:
> > 
> > if (cpu_is_omap15xx())
> > end++;
> > 
> > in dma_dest_len() - is that missing from the omap-dma driver?  It looks
> > like a work-around for some problem on OMAP15xx, but I can't make sense
> > about why it's in the UDC driver rather than the legacy DMA driver.
> 
> afaik no other legacy drivers were doing similar thing, this must be
> something which is needed for the omap_udc driver to fix up something?

Here's the patch that added it: 
https://marc.info/?l=linux-omap&m=119634396324221&w=2

"Make DMA-OUT behave on the 1510 ... the 1510 CPC register was just
off-by-one with respect to the 1611 CDAC"

A.


Re: [PATCH for v4.14] zram: close udev startup race condition as default groups

2018-11-23 Thread Sasha Levin

On Fri, Nov 23, 2018 at 03:25:50PM +0900, Minchan Kim wrote:

commit fef912bf860e upstream.
commit 98af4d4df889 upstream.

I got a report from Howard Chen that he saw zram and sysfs race(ie,
zram block device file is created but sysfs for it isn't yet)
when he tried to create new zram devices via hotadd knob.

v4.20 kernel fixes it by [1, 2] but it's too large size to merge
into -stable so this patch fixes the problem by registering defualt
group by Greg KH's approach[3].

This patch should be applied to every stable tree [3.16+] currently
existing from kernel.org because the problem was introduced at 2.6.37
by [4].

[1] fef912bf860e, block: genhd: add 'groups' argument to device_add_disk
[2] 98af4d4df889, zram: register default groups with device_add_disk()
[3] http://kroah.com/log/blog/2013/06/26/how-to-create-a-sysfs-file-correctly/
[4] 33863c21e69e9, Staging: zram: Replace ioctls with sysfs interface

Cc: Sergey Senozhatsky 
Cc: Hannes Reinecke 
Tested-by: Howard Chen 
Signed-off-by: Minchan Kim 


I've queued all 4 to their respective branches, thank you.

--
Thanks,
Sasha


Re: [GIT PULL] GPIO fixes for the v4.20 series

2018-11-23 Thread pr-tracker-bot
The pull request you sent on Fri, 23 Nov 2018 09:12:38 +0100:

> git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio.git 
> tags/gpio-v4.20-2

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/e6005d3c42336074de3745718ac85807dd6e1e6a

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.wiki.kernel.org/userdoc/prtracker


Re: [GIT PULL] MMC fixes for v4.20-rc4

2018-11-23 Thread pr-tracker-bot
The pull request you sent on Fri, 23 Nov 2018 08:25:18 +0100:

> git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc.git tags/mmc-v4.20-rc2

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/dcd3aa31dcdd6d8eae8d4771c44aeb3b1fec995a

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.wiki.kernel.org/userdoc/prtracker


[GIT PULL] Ceph fix for 4.20-rc4

2018-11-23 Thread Ilya Dryomov
Hi Linus,

The following changes since commit 9ff01193a20d391e8dbce4403dd5ef87c7eaaca6:

  Linux 4.20-rc3 (2018-11-18 13:33:44 -0800)

are available in the Git repository at:

  https://github.com/ceph/ceph-client.git tags/ceph-for-4.20-rc4

for you to fetch changes up to 7e241f647dc7087a0401418a187f3f5b527cc690:

  libceph: fall back to sendmsg for slab pages (2018-11-19 17:59:47 +0100)


A messenger fix, marked for stable.


Ilya Dryomov (1):
  libceph: fall back to sendmsg for slab pages

 net/ceph/messenger.c | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)


Re: [PATCH] x86: only use ERMS for user copies for larger sizes

2018-11-23 Thread Linus Torvalds
On Fri, Nov 23, 2018 at 10:39 AM Andy Lutomirski  wrote:
>
> What is memcpy_to_io even supposed to do?  I’m guessing it’s defined as 
> something like “copy this data to IO space using at most long-sized writes, 
> all aligned, and writing each byte exactly once, in order.”  That sounds... 
> dubiously useful.

We've got hundreds of users of it, so it's fairly common..

> I could see a function that writes to aligned memory in specified-sized 
> chunks.

We have that. It's called "__iowrite{32,64}_copy()". It has very few users.

  Linus


Re: [PATCH] x86: only use ERMS for user copies for larger sizes

2018-11-23 Thread Andy Lutomirski



> On Nov 23, 2018, at 10:42 AM, Linus Torvalds  
> wrote:
> 
> On Fri, Nov 23, 2018 at 8:36 AM Linus Torvalds
>  wrote:
>> 
>> Let me write a generic routine in lib/iomap_copy.c (which already does
>> the "user specifies chunk size" cases), and hook it up for x86.
> 
> Something like this?
> 
> ENTIRELY UNTESTED! It might not compile. Seriously. And if it does
> compile, it might not work.
> 
> And this doesn't actually do the memset_io() function at all, just the
> memcpy ones.
> 
> Finally, it's worth noting that on x86, we have this:
> 
>  /*
>   * override generic version in lib/iomap_copy.c
>   */
>  ENTRY(__iowrite32_copy)
>  movl %edx,%ecx
>  rep movsd
>  ret
>  ENDPROC(__iowrite32_copy)
> 
> because back in 2006, we did this:
> 
>[PATCH] Add faster __iowrite32_copy routine for x86_64
> 
>This assembly version is measurably faster than the generic version in
>lib/iomap_copy.c.
> 
> which actually implies that "rep movsd" is faster than doing
> __raw_writel() by hand.
> 
> So it is possible that this should all be arch-specific code rather
> than that butt-ugly "generic" code I wrote in this patch.
> 
> End result: I'm not really all that  happy about this patch, but it's
> perhaps worth testing, and it's definitely worth discussing. Because
> our current memcpy_{to,from}io() is truly broken garbage.
> 
>   

What is memcpy_to_io even supposed to do?  I’m guessing it’s defined as 
something like “copy this data to IO space using at most long-sized writes, all 
aligned, and writing each byte exactly once, in order.”  That sounds... 
dubiously useful.  I could see a function that writes to aligned memory in 
specified-sized chunks.  And I can see a use for a function to just write it in 
whatever size chunks the architecture thinks is fastest, and *that* should 
probably use MOVDIR64B.

Or is there some subtlety I’m missing?

Re: [RFC PATCH v4 1/5] glibc: Perform rseq(2) registration at nptl init and thread creation

2018-11-23 Thread Rich Felker
On Fri, Nov 23, 2018 at 12:52:21PM -0500, Mathieu Desnoyers wrote:
> - On Nov 23, 2018, at 12:30 PM, Rich Felker dal...@libc.org wrote:
> 
> > On Fri, Nov 23, 2018 at 12:05:20PM -0500, Mathieu Desnoyers wrote:
> >> - On Nov 23, 2018, at 9:28 AM, Rich Felker dal...@libc.org wrote:
> >> [...]
> >> > 
> >> > Absolutely. As long as it's in libc, implicit destruction will happen.
> >> > Actually I think the glibc code shound unconditionally unregister the
> >> > rseq address at exit (after blocking signals, so no application code
> >> > can run) in case a third-party rseq library was linked and failed to
> >> > do so before thread exit (e.g. due to mismatched ref counts) rather
> >> > than respecting the reference count, since it knows it's the last
> >> > user. This would make potentially-buggy code safer.
> >> 
> >> OK, let me go ahead with a few ideas/questions along that path.
> > ^^^
> >> 
> >> Let's say our stated goal is to let the "exit" system call from the
> >> glibc thread exit path perform rseq unregistration (without explicit
> >> unregistration beforehand). Let's look at what we need.
> > 
> > This is not "along that path". The above-quoted text is not about
> > assuming it's safe to make SYS_exit without unregistering the rseq
> > object, but rather about glibc being able to perform the
> > rseq-unregister syscall without caring about reference counts, since
> > it knows no other code that might depend on rseq can run after it.
> 
> When saying "along that path", what I mean is: if we go in that direction,
> then we should look into going all the way there, and rely on thread
> exit to implicitly unregister the TLS area.
> 
> Do you see any reason for doing an explicit unregistration at thread
> exit rather than simply rely on the exit system call ?

Whether this is needed is an implementation detail of glibc that
should be permitted to vary between versions. Unless glibc wants to
promise that it would become a public guarantee, it's not part of the
discussion around the API/ABI. Only part of the discussion around
implementation internals of the glibc rseq stuff.

Of course I may be biased thinking application code should not assume
this since it's not true on musl -- for detached threads, the thread
frees its own stack before exiting (and thus has to unregister
set_tid_address and set_robustlist before exiting).

> >> First, we need the TLS area to be valid until the exit system call
> >> is invoked by the thread. If glibc defines __rseq_abi as a weak symbol,
> >> I'm not entirely sure we can guarantee the IE model if another library
> >> gets its own global-dynamic weak symbol elected at execution time. Would
> >> it be better to switch to a "strong" symbol for the glibc __rseq_abi
> >> rather than weak ?
> > 
> > This doesn't help; still whichever comes first in link order would
> > override. Either way __rseq_abi would be in static TLS, though,
> > because any dynamically-loaded library is necessarily loaded after
> > libc, which is loaded at initial exec time.
> 
> OK, AFAIU so you argue for leaving the __rseq_abi symbol "weak". Just making
> sure I correctly understand your position.

I don't think it matters, and I don't think making it weak is
meaningful or useful (weak in a shared library is largely meaningless)
but maybe I'm missing something here.

> Something can be technically correct based on the current implementation,
> but fragile with respect to future changes. We need to carefully distinguish
> between the two when exposing ABIs.

Yes.

> >> There has been presumptions about signals being blocked when the thread
> >> exits throughout this email thread. Out of curiosity, what code is
> >> responsible for disabling signals in this situation ?
> 
> This question is still open.

I can't find it -- maybe it's not done in glibc. It is in musl, and I
assumed glibc would also do it, because otherwise it's possible to see
some inconsistent states from signal handlers. Maybe these are all
undefined due to AS-unsafety of pthread_exit, but I think you can
construct examples where something could be observably wrong without
breaking any rules.

> > Related to this,
> >> is it valid to access a IE model TLS variable from a signal handler at
> >> _any_ point where the signal handler nests over thread's execution ?
> >> This includes early start and just before invoking the exit system call.
> > 
> > It should be valid to access *any* TLS object like this, but the
> > standards don't cover it well. Right now access to dynamic TLS from
> > signal handlers is unsafe in glibc, but static is safe.
> 
> Which is a shame for the lttng-ust tracer, which needs global-dynamic
> TLS variables so it can be dlopen'd, but aims at allowing tracing from
> signal handlers. It looks like due to limitations of global-dynamic
> TLS, tracing from instrumented signal handlers with lttng-ust tracepoints
> could crash the process if the signal handl

Re: [for-next][PATCH 08/18] parisc: function_graph: Simplify with function_graph_entry()

2018-11-23 Thread Sasha Levin

On Fri, Nov 23, 2018 at 12:12:53PM -0500, Steven Rostedt wrote:

On Fri, 23 Nov 2018 10:06:05 +0100
Helge Deller  wrote:



> How should we proceed with this patch?

My suggestion, although I didn't looked too much on it:
Apply it to v4.9 and higher only.
I think I started fixing trace functionality on parisc around 4.6,
which is probably why applying it fails on v4.4 and v3.x


The problem is, if you backport the generic patches, it will completely
break any arch that isn't updated. This also includes the archs that
are no longer supported upstream, as they were not changed to handle
the generic updates either.


Does this mean that someone (Steve) will send a backport of this to all
relevant stable trees? Right now it looks like the series will randomly
apply on a mix of trees, which can't be good.

--
Thanks,
Sasha


Re: [PATCH 2/2] ARM: module: Fix function kallsyms on Thumb-2

2018-11-23 Thread Dave Martin
On Mon, Nov 19, 2018 at 05:25:13PM +0100, Vincent Whitchurch wrote:
> Thumb-2 functions have the lowest bit set in the symbol value in the
> symtab.  When kallsyms are generated for the vmlinux, the kallsyms are
> generated from the output of nm, and nm clears the lowest bit.
> 
>  $ arm-linux-gnueabihf-readelf -a vmlinux | grep show_interrupts
>   95947: 8015dc89   686 FUNCGLOBAL DEFAULT2 show_interrupts
>  $ arm-linux-gnueabihf-nm vmlinux | grep show_interrupts
>  8015dc88 T show_interrupts
>  $ cat /proc/kallsyms | grep show_interrupts
>  8015dc88 T show_interrupts
> 
> However, for modules, the kallsyms uses the values in the symbol table
> without modification, so for functions in modules, the lowest bit is set
> in kallsyms.
> 
>  $ arm-linux-gnueabihf-readelf -a drivers/net/tun.ko | grep tun_get_socket
> 333: 2d4d36 FUNCGLOBAL DEFAULT1 tun_get_socket
>  $ arm-linux-gnueabihf-nm drivers/net/tun.ko | grep tun_get_socket
>  2d4c T tun_get_socket
>  $ cat /proc/kallsyms | grep tun_get_socket
>  7f802d4d t tun_get_socket  [tun]
> 
> Because of this, the symbol+offset of the crashing instruction shown in
> oopses is incorrect when the crash is in a module.  For example, given a
> tun_get_socket which starts like this,
> 
>  2d4c :
>  2d4c:   6943ldr r3, [r0, #20]
>  2d4e:   4a07ldr r2, [pc, #28]
>  2d50:   4293cmp r3, r2
> 
> a crash when tun_get_socket is called with NULL results in:
> 
>  PC is at tun_xdp+0xa3/0xa4 [tun]
>  pc : [<7f802d4c>]
> 
> As can be seen, the "PC is at" line reports the wrong symbol name, and
> the symbol+offset will point to the wrong source line if it is passed to
> gdb.
> 
> To solve this, add a way for archs to fixup the reading of these module
> kallsyms values, and use that to clear the lowest bit for function
> symbols on Thumb-2.
> 
> After the fix:
> 
>  # cat /proc/kallsyms | grep tun_get_socket
>  7f802d4c t tun_get_socket   [tun]
> 
>  PC is at tun_get_socket+0x0/0x24 [tun]
>  pc : [<7f802d4c>]
> 
> Signed-off-by: Vincent Whitchurch 
> ---
> v4: Split out st_value overwrite change.  Add HAVE* macro to avoid function 
> call.
> v3: Do not overwrite st_value
> v2: Fix build warning with !MODULES
> 
>  arch/arm/include/asm/module.h | 11 +++
>  include/linux/module.h|  7 +++
>  kernel/module.c   | 25 ++---
>  3 files changed, 32 insertions(+), 11 deletions(-)
> 
> diff --git a/arch/arm/include/asm/module.h b/arch/arm/include/asm/module.h
> index 9e81b7c498d8..e2ccec651e71 100644
> --- a/arch/arm/include/asm/module.h
> +++ b/arch/arm/include/asm/module.h
> @@ -61,4 +61,15 @@ u32 get_module_plt(struct module *mod, unsigned long loc, 
> Elf32_Addr val);
>   MODULE_ARCH_VERMAGIC_ARMTHUMB \
>   MODULE_ARCH_VERMAGIC_P2V
>  
> +#ifdef CONFIG_THUMB2_KERNEL
> +#define HAVE_ARCH_MODULE_KALLSYMS_SYMBOL_VALUE
> +static inline unsigned long module_kallsyms_symbol_value(Elf_Sym *sym)
> +{
> + if (ELF_ST_TYPE(sym->st_info) == STT_FUNC)
> + return sym->st_value & ~1;
> +
> + return sym->st_value;
> +}
> +#endif
> +
>  #endif /* _ASM_ARM_MODULE_H */
> diff --git a/include/linux/module.h b/include/linux/module.h
> index fce6b4335e36..cfc55f358800 100644
> --- a/include/linux/module.h
> +++ b/include/linux/module.h
> @@ -486,6 +486,13 @@ struct module {
>  #define MODULE_ARCH_INIT {}
>  #endif
>  
> +#ifndef HAVE_ARCH_MODULE_KALLSYMS_SYMBOL_VALUE
> +static inline unsigned long module_kallsyms_symbol_value(Elf_Sym *sym)
> +{
> + return sym->st_value;
> +}
> +#endif
> +
>  extern struct mutex module_mutex;
>  
>  /* FIXME: It'd be nice to isolate modules during init, too, so they
> diff --git a/kernel/module.c b/kernel/module.c
> index 3d86a38b580c..a36d7915ed2b 100644
> --- a/kernel/module.c
> +++ b/kernel/module.c
> @@ -3934,6 +3934,9 @@ static const char *get_ksymbol(struct module *mod,
>   /* Scan for closest preceding symbol, and next symbol. (ELF
>  starts real symbols at 1). */
>   for (i = 1; i < kallsyms->num_symtab; i++) {
> + unsigned long thisval = 
> module_kallsyms_symbol_value(&kallsyms->symtab[i]);
> + unsigned long bestval = 
> module_kallsyms_symbol_value(&kallsyms->symtab[best]);
> +
>   if (kallsyms->symtab[i].st_shndx == SHN_UNDEF)
>   continue;
>  
> @@ -3943,21 +3946,21 @@ static const char *get_ksymbol(struct module *mod,
>   || is_arm_mapping_symbol(symname(kallsyms, i)))
>   continue;
>  
> - if (kallsyms->symtab[i].st_value <= addr
> - && kallsyms->symtab[i].st_value > 
> kallsyms->symtab[best].st_value)
> + if (thisval <= addr
> + && thisval > bestval)

Nit: this fits easily on one line now.

>   best = i;

Can we declare bestval outside the loop and update it here, so that
we always have

Re: [PATCH 1/2] scripts/decodecode: Set ARCH when running natively on arm/arm64

2018-11-23 Thread Will Deacon
On Thu, Nov 22, 2018 at 12:14:39PM +, Marc Zyngier wrote:
> When running decodecode natively on arm64, ARCH is likely not to be set,
> and we end-up with .4byte instead of .inst when generating the disassembly.
> 
> Similar effects would occur if running natively on a 32bit ARM platform,
> although that's even less popular.
> 
> A simple workaround is to populate ARCH when it is not set and that we're
> running on an arm/arm64 system.
> 
> Signed-off-by: Marc Zyngier 
> ---
>  scripts/decodecode | 7 +++
>  1 file changed, 7 insertions(+)

Acked-by: Will Deacon 

Will


Re: [RFC PATCH v2] gen/irq: Change the way to differentiate between managed and unmanaged interrupts by bitmap

2018-11-23 Thread Thomas Gleixner
Dou,

On Sat, 10 Nov 2018, Dou Liyang wrote:

sorry for late reply. Travel, conferences 

> --- a/drivers/pci/msi.c
> +++ b/drivers/pci/msi.c
> @@ -673,14 +674,17 @@ static int msix_setup_entries(struct pci_dev *dev, void 
> __iomem *base,
> const struct irq_affinity *affd)
>  {
>   struct cpumask *curmsk, *masks = NULL;
> + DECLARE_BITMAP(managed, nvec);

This is not working as it creates a variable length array (VLA). We just
got rid of all VLAs in the kernel and the compiler will catch this and
complain.

>   struct msi_desc *entry;
>   int ret, i;
>  
> + memset(managed, 0, sizeof(managed));
>   if (affd)
> - masks = irq_create_affinity_masks(nvec, affd);
> + masks = irq_create_affinity_masks(nvec, affd, managed);

So rather than changing the world and everything by adding new arguments,
what about the following:

-struct cpumask *irq_create_affinity_masks(int nvec, const struct irq_affinity 
*affd);
+struct irq_affinity_desc * irq_create_affinity_masks(int nvec, struct 
irq_affinity *affd);

struct irq_affinity_desc {
struct cpumask  mask;
unsigned long   flags;
};

or something like that. Let irq_create_affinity_masks() allocate an array
of those and store the information in the flags field. In the allocation
functions just replace the cpumask pointer with a irq_affinity_desc pointer
and hand that through to the irqdesc core, where you can evaluate the
flags. Way less things to modify and the data structure allows to expand
this in the future without touching all the functions ever again.

Can you please work against the tip irq/core branch as that has already
other changes in the affinity spreading code queued?

Thanks,

tglx




Re: [PATCH] KVM: VMX: re-add ple_gap module parameter

2018-11-23 Thread Luiz Capitulino
On Fri, 23 Nov 2018 19:42:53 +0200
Liran Alon  wrote:

> > On 23 Nov 2018, at 19:02, Luiz Capitulino  wrote:
> > 
> > 
> > Apparently, the ple_gap parameter was accidentally removed
> > by commit c8e88717cfc6b36bedea22368d97667446318291. Add it
> > back.
> > 
> > Signed-off-by: Luiz Capitulino   
> 
> Weird that nobody noticed this when patch was applied… Thanks.
> Reviewed-by: Liran Alon 

I forgot to mention that I noticed this because I have systems
disabling ple with ple_gap=0 in modprobe.conf. In those systems
kvm_intel won't load anymore.

> 
> > ---
> > arch/x86/kvm/vmx.c | 1 +
> > 1 file changed, 1 insertion(+)
> > 
> > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> > index 4555077d69ce..be6f13f1c25f 100644
> > --- a/arch/x86/kvm/vmx.c
> > +++ b/arch/x86/kvm/vmx.c
> > @@ -174,6 +174,7 @@ module_param_named(preemption_timer, 
> > enable_preemption_timer, bool, S_IRUGO);
> >  * refer SDM volume 3b section 21.6.13 & 22.1.3.
> >  */
> > static unsigned int ple_gap = KVM_DEFAULT_PLE_GAP;
> > +module_param(ple_gap, uint, 0444);
> > 
> > static unsigned int ple_window = KVM_VMX_DEFAULT_PLE_WINDOW;
> > module_param(ple_window, uint, 0444);
> > -- 
> > 2.17.2
> >   
> 



Re: [RFC][PATCH 07/14] fgraph: Add new fgraph_ops structure to enable function graph hooks

2018-11-23 Thread Steven Rostedt
On Thu, 22 Nov 2018 18:59:46 -0800
Joel Fernandes  wrote:

> >  static struct ftrace_ops ftrace_profile_ops __read_mostly = {
> > diff --git a/kernel/trace/trace_functions_graph.c 
> > b/kernel/trace/trace_functions_graph.c
> > index 0e0ff08357cf..7c7fd13d2373 100644
> > --- a/kernel/trace/trace_functions_graph.c
> > +++ b/kernel/trace/trace_functions_graph.c
> > @@ -336,17 +336,25 @@ static void trace_graph_thresh_return(struct 
> > ftrace_graph_ret *trace)
> > trace_graph_return(trace);
> >  }
> >  
> > +static struct fgraph_ops funcgraph_threash_ops = {  
> 
> minor nit: should be funcgraph_thresh_ops ?

At least I got it wrong consistently :-)

Fixed, thanks!

-- Steve



RE: [PATCH] perf symbols: Cannot disassemble some routines when debuginfo present

2018-11-23 Thread Eric Saint Etienne
> > +   /*
> > +* When using -ffunction-sections, only .text gets loaded by
> > +* map_groups__find() into al->map. Consequently al->map address
> > +* range encompass the whole code.
> > +*
> > +* But map__load() has just loaded many function maps by
> > +* splitting al->map, which reduced al->map range drastically.
> > +* Very likely the target address is now in one of those newly
> > +* created function maps, so we need to lookup the map again
> > +* to find that new map.
> > +*/
> 
> hum, so map__load actualy can split the map to create new maps?
> 
> cold you please point me to that code? I haven't touch this area for some
> time and I can't find it

The split happens in dso_process_kernel_symbol() in symbol-elf.c where we
call map_groups__find_by_name() to find an existing map, but with
-ffunction-sections and a symbol belonging to new (function) map, such map
doesn't exist yet so we end up creating one and adjusting existing maps
accordingly because adjust_kernel_syms is set. Makes sense?

As of 4.20-rc3 the call chain is as follows:
event:c:1573   map__load()
map.c:315  dso__load()
symobl.c:1528  dso__load_kernel_sym()
symbol.c:1896  dso__load_vmlinux_path()
   (or we directly call dso__load_vmlinux() at line 1892)
symbol.c:1744  dso__load_vmlinux()
symbol.c:1719  dso__load_sym()
symbol-elf.c:1090  dso_process_kernel_symbol()

-eric


Re: [PATCH] serial: 8250: Default SERIAL_OF_PLATFORM to SERIAL_8250

2018-11-23 Thread Guenter Roeck
On Mon, Nov 19, 2018 at 12:50:50PM -0800, Guenter Roeck wrote:
> On Mon, Nov 19, 2018 at 10:44:30AM -0800, Florian Fainelli wrote:
> > On 11/15/18 5:16 PM, Guenter Roeck wrote:
> > > On Thu, Nov 15, 2018 at 11:48:20AM -0800, Florian Fainelli wrote:
> > >>
> > >> OK, would you mind testing this below? It seems to me that 8250_of.c is
> > >> incompatible with arch/powerpc/kernel/legacy_serial.c and that is what
> > >> is causing the issue here.
> > >>
> > >> diff --git a/drivers/tty/serial/8250/Kconfig
> > >> b/drivers/tty/serial/8250/Kconfig
> > >> index d7737dca0e48..21cb14cbd34a 100644
> > >> --- a/drivers/tty/serial/8250/Kconfig
> > >> +++ b/drivers/tty/serial/8250/Kconfig
> > >> @@ -483,7 +483,7 @@ config SERIAL_8250_PXA
> > >>
> > >>  config SERIAL_OF_PLATFORM
> > >> tristate "Devicetree based probing for 8250 ports"
> > >> -   depends on SERIAL_8250 && OF
> > >> +   depends on SERIAL_8250 && OF && !(PPC && PPC_UDBG_16550)
> > >> default SERIAL_8250
> > >> help
> > >>   This option is used for all 8250 compatible serial ports that
> > > 
> > > 44x/virtex5_defconfig has both PPC_UDBG_16550 and SERIAL_OF_PLATFORM 
> > > enabled
> > > and fails to boot (or display anything on the console) with this patch 
> > > applied.
> > 
> > Thanks for trying, can you either share or provide a link to the mpc85xx
> > and ml507 qemu command lines that you use? I spent a good chunk of my
> > time trying to get a kernel to boot but has failed so far.
> > 
> 

Any update ? I still see the boot failures in next-20181123.

Guenter

> Good to hear that this doesn't just happen to me ;-).
> 
> The scripts are all at https://github.com/groeck/linux-build-test/.
> This includes root file systems. The one used below is at
> https://github.com/groeck/linux-build-test/blob/master/rootfs/ppc/rootfs.cpio.gz
> 
> ml507:
> 
> qemu-system-ppc -kernel vmlinux -M virtex-ml507 -m 256 -no-reboot \
>   -initrd rootfs.cpio -dtb arch/powerpc/boot/dts/virtex440-ml507.dtb \
>   --append 'rdinit=/sbin/init panic=-1 mem=256M console=ttyS0' \
>   -monitor none -nographic
> 
> mpc85xx:
> 
> qemu-system-ppc -kernel arch/powerpc/boot/uImage -M mpc8544ds -m 256 \
>   -no-reboot -initrd rootfs.cpio \
>   --append 'rdinit=/sbin/init panic=-1 mem=256M console=ttyS0' \
>   -monitor none -nographic
> 
> Hope this helps,
> Guenter


Re: [RFC][PATCH 06/14] fgraph: Move function graph specific code into fgraph.c

2018-11-23 Thread Steven Rostedt
On Fri, 23 Nov 2018 12:58:34 -0500
Steven Rostedt  wrote:

> I think the better answer is to move it into trace_functions_graph.c.

I take that back. I think the better answer is to not call that
function if the profiler is not set, nor have that option even
available. Because it has no meaning without the profiler.

-- Steve


Re: dcache_readdir NULL inode oops

2018-11-23 Thread Will Deacon
Hi all,

I've now managed to reproduce this on x86 (log below) but I'm out of my
depth with this one. Looping in Greg and Jiri because I fear this is
specific to the pty code. Rest of the thread is here:

http://lkml.kernel.org/r/20181109143744.GA12128@hc

On Wed, Nov 21, 2018 at 01:19:06PM +, Jan Glauber wrote:
> On Tue, Nov 20, 2018 at 07:03:17PM +, Will Deacon wrote:
> > On Tue, Nov 20, 2018 at 06:28:54PM +, Will Deacon wrote:
> > > On Sat, Nov 10, 2018 at 11:17:03AM +, Jan Glauber wrote:
> > > > On Fri, Nov 09, 2018 at 03:58:56PM +, Will Deacon wrote:
> > > > > On Fri, Nov 09, 2018 at 02:37:51PM +, Jan Glauber wrote:
> > > > > > I'm seeing the following oops reproducible with upstream kernel on 
> > > > > > arm64
> > > > > > (ThunderX2):
> > > > >
> > > > > [...]
> > > > >
> > > > > > It happens after 1-3 hours of running 'stress-ng --dev 128'. This 
> > > > > > testcase
> > > > > > does a scandir of /dev and then calls random stuff like ioctl, 
> > > > > > lseek,
> > > > > > open/close etc. on the entries. I assume no files are deleted under 
> > > > > > /dev
> > > > > > during the testcase.
> > > > > >
> > > > > > The NULL pointer is the inode pointer of next. The next 
> > > > > > dentry->d_flags is
> > > > > > DCACHE_RCUACCESS when this happens.
> > > > > >
> > > > > > Any hints on how to further debug this?
> > > > >
> > > > > Can you reproduce the issue with vanilla -rc1 and do you have a 
> > > > > "known good"
> > > > > kernel?
> > > >
> > > > I can try out -rc1, but IIRC this wasn't bisectible as the bug was 
> > > > present at
> > > > least back to 4.14. I need to double check that as there were other 
> > > > issues
> > > > that are resolved now so I may confuse things here. I've defintely seen
> > > > the same bug with 4.18.
> > > >
> > > > Unfortunately I lost access to the machine as our data center seems to 
> > > > be
> > > > moving currently so it might take some days until I can try -rc1.
> > >
> > > Ok, I've just managed to reproduce this in a KVM guest running v4.20-rc3 
> > > on
> > > both the host and the guest, so if anybody has any ideas of things to try 
> > > then
> > > I'm happy to give them a shot. In the meantime, I'll try again with a 
> > > bunch of
> > > debug checks enabled.
> 
> good that you can reproduce the issue. I've verified that the issue is
> indeed reproducible with 4.14.
> 
> > 
> > Weee, I eventually hit a use-after-free from KASAN. See below.
> 
> I ran KASAN (and all the other debug stuff) but didn't trigger anything
> in the host.

Doing some more debugging, it looks like the usual failure case is where
one CPU clears the inode field in the dentry via:

devpts_pty_kill()
-> d_delete()   // dentry->d_lockref.count == 1
-> dentry_unlink_inode()

whilst another CPU gets a pointer to the dentry via:

sys_getdents64()
-> iterate_dir()
-> dcache_readdir()
-> next_positive()

and explodes on the subsequent inode dereference when trying to pass the
inode number to dir_emit():

if (!dir_emit(..., d_inode(next)->i_ino, ...))

Indeed, the hack below triggers a warning, indicating that the inode
is being cleared concurrently.

I can't work out whether the getdents64() path should hold a refcount
to stop d_delete() in its tracks, or whether devpts_pty_kill() shouldn't
be calling d_delete() like this at all.

Any help debugging this would be really appreciated.

Will

--->8

diff --git a/fs/libfs.c b/fs/libfs.c
index 0fb590d79f30..fe3f72c5cf72 100644
--- a/fs/libfs.c
+++ b/fs/libfs.c
@@ -117,6 +117,7 @@ static struct dentry *next_positive(struct dentry *parent,
if (unlikely(*seq != n))
goto retry;
}
+   WARN_ON(res && !d_inode(res));
return res;
 }
 
--->8

[  410.560885] BUG: unable to handle kernel NULL pointer dereference at 

[  410.561458] PGD 0 P4D 0 
[  410.561458] Oops:  [#1] PREEMPT SMP PTI
[  410.561458] CPU: 1 PID: 983 Comm: stress-ng-dev Not tainted 4.20.0-rc3 #1
[  410.561458] RIP: 0010:dcache_readdir+0xd6/0x160
[  410.561458] Code: 7d 48 c7 43 08 02 00 00 00 4d 8d a5 a0 00 00 00 45 31 f6 
eb 41 48 8b 45 30 48 8b 4b 08 48 89 df 8b 55 24 48 8b 75 28 4c 8b 13 <44> 0f b7 
08 4c 8b 40 40 66 41 c1 e9 0c 41 83 e1 0f e8 24 5e c1 00
[  410.565432] RSP: 0018:b0a2004e7e48 EFLAGS: 00010286
[  410.565432] RAX:  RBX: b0a2004e7ec0 RCX: 0002
[  410.565432] RDX: 0001 RSI: 9ccef8396938 RDI: b0a2004e7ec0
[  410.565432] RBP: 9ccef8396900 R08: 0001 R09: 
[  410.565432] R10: 947d7070 R11:  R12: 9ccefcc23e20
[  410.565432] R13: 9ccefcc23d80 R14:  R15: 9ccef7c1c0c0
[  410.565432] FS:  7f1883901740() GS:9ccefd70() 
knlGS:
[  410.565432] CS:  0010 DS:  ES:  CR0: 00

Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce

2018-11-23 Thread Jason Gunthorpe
On Fri, Nov 23, 2018 at 04:02:42PM +0800, Kenneth Lee wrote:

> It is already part of Jean's patchset. And that's why I built my solution on
> VFIO in the first place. But I think the concept of SVA and PASID is not
> compatible with the original VFIO concept space. You would not share your 
> whole
> address space to a device at all in a virtual machine manager,
> wouldn't you?

Why not? That seems to fit VFIO's space just fine to me.. You might
need a new upcall to create a full MM registration, but that doesn't
seem unsuited.

Part of the point here is you should try to make sensible revisions to
existing subsystems before just inventing a new thing...

VFIO is deeply connected to the IOMMU, so enabling more general IOMMU
based approache seems perfectly fine to me..

> > Once the VFIO driver knows about this as a generic capability then the
> > device it exposes to userspace would use CPU addresses instead of DMA
> > addresses.
> > 
> > The question is if your driver needs much more than the device
> > agnostic generic services VFIO provides.
> > 
> > I'm not sure what you have in mind with resource management.. It is
> > hard to revoke resources from userspace, unless you are doing
> > kernel syscalls, but then why do all this?
> 
> Say, I have 1024 queues in my accelerator. I can get one by opening the device
> and attach it with the fd. If the process exit by any means, the queue can be
> returned with the release of the fd. But if it is mdev, it will still be there
> and some one should tell the allocator it is available again. This is not easy
> to design in user space.

?? why wouldn't the mdev track the queues assigned using the existing
open/close/ioctl callbacks?

That is basic flow I would expect:

 open(/dev/vfio)
 ioctl(unity map entire process MM to mdev with IOMMU)

 // Create a HQ queue and link the PASID in the HW to this HW queue
 struct hw queue[..];
 ioctl(create HW queue)

 // Get BAR doorbell memory for the queue
 bar = mmap()

 // Submit work to the queue using CPU addresses
 queue[0] = ...
 writel(bar [..], &queue);

 // Queue, SVA, etc is cleaned up when the VFIO closes
 close()

Presumably the kernel has to handle the PASID and related for security
reasons, so they shouldn't go to userspace?

If there is something missing in vfio to do this is it looks pretty
small to me..

Jason


Re: [RFC PATCH v4 1/5] glibc: Perform rseq(2) registration at nptl init and thread creation

2018-11-23 Thread Florian Weimer
* Rich Felker:

> On Fri, Nov 23, 2018 at 06:39:04PM +0100, Florian Weimer wrote:
>> * Rich Felker:
>> 
>> > On Fri, Nov 23, 2018 at 12:05:20PM -0500, Mathieu Desnoyers wrote:
>> >> There has been presumptions about signals being blocked when the thread
>> >> exits throughout this email thread. Out of curiosity, what code is
>> >> responsible for disabling signals in this situation ? Related to this,
>> >> is it valid to access a IE model TLS variable from a signal handler at
>> >> _any_ point where the signal handler nests over thread's execution ?
>> >> This includes early start and just before invoking the exit system call.
>> >
>> > It should be valid to access *any* TLS object like this, but the
>> > standards don't cover it well.
>> 
>> C++ makes it undefined:
>> 
>>   
>
> C also leaves access to pretty much anything from a signal handler
> undefined, but that makes signals basically useless.

Access to atomic variables of thread storage duration is defined,
though.

Thanks,
Florian


Re: [PATCH 1/2] module: Overwrite st_size instead of st_info

2018-11-23 Thread Dave Martin
On Mon, Nov 19, 2018 at 05:25:12PM +0100, Vincent Whitchurch wrote:
> st_info is currently overwritten after relocation and used to store the
> elf_type().  However, we're going to need it fix kallsyms on ARM's
> Thumb-2 kernels, so preserve st_info and overwrite the st_size field
> instead.  st_size is neither used by the module core nor by any
> architecture.
> 
> Signed-off-by: Vincent Whitchurch 
> ---
> v4: Split out to separate patch.  Use st_size instead of st_other.
> v1-v3: See PATCH 2/2
> 
>  kernel/module.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/module.c b/kernel/module.c
> index 49a405891587..3d86a38b580c 100644
> --- a/kernel/module.c
> +++ b/kernel/module.c
> @@ -2682,7 +2682,7 @@ static void add_kallsyms(struct module *mod, const 
> struct load_info *info)
>  
>   /* Set types up while we still have access to sections. */
>   for (i = 0; i < mod->kallsyms->num_symtab; i++)
> - mod->kallsyms->symtab[i].st_info
> + mod->kallsyms->symtab[i].st_size
>   = elf_type(&mod->kallsyms->symtab[i], info);
>  
>   /* Now populate the cut down core kallsyms for after init. */
> @@ -4061,7 +4061,7 @@ int module_get_kallsym(unsigned int symnum, unsigned 
> long *value, char *type,
>   kallsyms = rcu_dereference_sched(mod->kallsyms);
>   if (symnum < kallsyms->num_symtab) {
>   *value = kallsyms->symtab[symnum].st_value;
> - *type = kallsyms->symtab[symnum].st_info;
> + *type = kallsyms->symtab[symnum].st_size;
>   strlcpy(name, symname(kallsyms, symnum), KSYM_NAME_LEN);
>   strlcpy(module_name, mod->name, MODULE_NAME_LEN);
>   *exported = is_exported(name, *value, mod);

Based on the discussion here:

Reviewed-by: Dave Martin 

(I still think this is a bit gross, but without resorting to overkill
there's not an obvious nicer option.  Even before this patch, we were
abusing st_info.  Of course, someday somebody may find a use for
st_size and then we'd need to revisit this code...)

Cheers
---Dave


Re: [PATCH 1/2] mm: Remove redundant test from find_get_pages_contig

2018-11-23 Thread Matthew Wilcox
On Fri, Nov 23, 2018 at 01:47:32PM +0300, Kirill A. Shutemov wrote:
> On Thu, Nov 22, 2018 at 01:32:23PM -0800, Matthew Wilcox wrote:
> > After we establish a reference on the page, we check the pointer continues
> > to be in the correct position in i_pages.  There's no need to check the
> > page->mapping or page->index afterwards; if those can change after we've
> > got the reference, they can change after we return the page to the caller.
> 
> Hm. IIRC, page->mapping can be set to NULL due truncation, but what about
> index? When it can be changed? Truncation doesn't touch it.

I think index can only be changed after the refcount has hit zero and
the page is safely out of the pagecache.  I agree that page->mapping can
be set to NULL after the call to xas_reload() ... but then it can also
happen after the check, so the check isn't really buying us anything
that the xas_reload() call doesn't already check.


Re: [RFC][PATCH 06/14] fgraph: Move function graph specific code into fgraph.c

2018-11-23 Thread Steven Rostedt
On Thu, 22 Nov 2018 22:11:33 -0800
Joel Fernandes  wrote:

> On Wed, Nov 21, 2018 at 08:27:14PM -0500, Steven Rostedt wrote:
> > From: "Steven Rostedt (VMware)" 
> > 
> > To make the function graph infrastructure more managable, the code needs to
> > be in its own file (fgraph.c). Move the code that is specific for managing
> > the function graph infrastructure out of ftrace.c and into fgraph.c
> > 
> > Signed-off-by: Steven Rostedt (VMware)   
> 
> I think this patch causes a build error if CONFIG_FUNCTION_PROFILER is
> disabled but function graph is enabled. The following diff fixes it for me.

Thanks for reporting this.

> 
> thanks,
> 
>  - Joel
> 
>  8<--
>  
> diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
> index 3b8c307c7ff0..ce38bb962f91 100644
> --- a/kernel/trace/ftrace.c
> +++ b/kernel/trace/ftrace.c
> @@ -382,6 +382,15 @@ static void ftrace_update_pid_func(void)
>   update_ftrace_function();
>  }
>  
> +#ifdef CONFIG_FUNCTION_GRAPH_TRACER
> +static bool fgraph_graph_time = true;
> +
> +void ftrace_graph_graph_time_control(bool enable)
> +{
> + fgraph_graph_time = enable;
> +}
> +#endif
> +

I think the better answer is to move it into trace_functions_graph.c.

I'll do that and give you the "reported-by".

-- Steve


>  #ifdef CONFIG_FUNCTION_PROFILER
>  struct ftrace_profile {
>   struct hlist_node   node;
> @@ -783,12 +792,6 @@ function_profile_call(unsigned long ip, unsigned long 
> parent_ip,
>  }
>  
>  #ifdef CONFIG_FUNCTION_GRAPH_TRACER
> -static bool fgraph_graph_time = true;
> -
> -void ftrace_graph_graph_time_control(bool enable)
> -{
> - fgraph_graph_time = enable;
> -}
>  
>  static int profile_graph_entry(struct ftrace_graph_ent *trace)
>  {



Re: [RFC PATCH v4 1/5] glibc: Perform rseq(2) registration at nptl init and thread creation

2018-11-23 Thread Mathieu Desnoyers
- On Nov 23, 2018, at 12:30 PM, Rich Felker dal...@libc.org wrote:

> On Fri, Nov 23, 2018 at 12:05:20PM -0500, Mathieu Desnoyers wrote:
>> - On Nov 23, 2018, at 9:28 AM, Rich Felker dal...@libc.org wrote:
>> [...]
>> > 
>> > Absolutely. As long as it's in libc, implicit destruction will happen.
>> > Actually I think the glibc code shound unconditionally unregister the
>> > rseq address at exit (after blocking signals, so no application code
>> > can run) in case a third-party rseq library was linked and failed to
>> > do so before thread exit (e.g. due to mismatched ref counts) rather
>> > than respecting the reference count, since it knows it's the last
>> > user. This would make potentially-buggy code safer.
>> 
>> OK, let me go ahead with a few ideas/questions along that path.
> ^^^
>> 
>> Let's say our stated goal is to let the "exit" system call from the
>> glibc thread exit path perform rseq unregistration (without explicit
>> unregistration beforehand). Let's look at what we need.
> 
> This is not "along that path". The above-quoted text is not about
> assuming it's safe to make SYS_exit without unregistering the rseq
> object, but rather about glibc being able to perform the
> rseq-unregister syscall without caring about reference counts, since
> it knows no other code that might depend on rseq can run after it.

When saying "along that path", what I mean is: if we go in that direction,
then we should look into going all the way there, and rely on thread
exit to implicitly unregister the TLS area.

Do you see any reason for doing an explicit unregistration at thread
exit rather than simply rely on the exit system call ?


> 
>> First, we need the TLS area to be valid until the exit system call
>> is invoked by the thread. If glibc defines __rseq_abi as a weak symbol,
>> I'm not entirely sure we can guarantee the IE model if another library
>> gets its own global-dynamic weak symbol elected at execution time. Would
>> it be better to switch to a "strong" symbol for the glibc __rseq_abi
>> rather than weak ?
> 
> This doesn't help; still whichever comes first in link order would
> override. Either way __rseq_abi would be in static TLS, though,
> because any dynamically-loaded library is necessarily loaded after
> libc, which is loaded at initial exec time.

OK, AFAIU so you argue for leaving the __rseq_abi symbol "weak". Just making
sure I correctly understand your position.

Something can be technically correct based on the current implementation,
but fragile with respect to future changes. We need to carefully distinguish
between the two when exposing ABIs.

> 
>> There has been presumptions about signals being blocked when the thread
>> exits throughout this email thread. Out of curiosity, what code is
>> responsible for disabling signals in this situation ?

This question is still open.

> Related to this,
>> is it valid to access a IE model TLS variable from a signal handler at
>> _any_ point where the signal handler nests over thread's execution ?
>> This includes early start and just before invoking the exit system call.
> 
> It should be valid to access *any* TLS object like this, but the
> standards don't cover it well. Right now access to dynamic TLS from
> signal handlers is unsafe in glibc, but static is safe.

Which is a shame for the lttng-ust tracer, which needs global-dynamic
TLS variables so it can be dlopen'd, but aims at allowing tracing from
signal handlers. It looks like due to limitations of global-dynamic
TLS, tracing from instrumented signal handlers with lttng-ust tracepoints
could crash the process if the signal handler nests early at thread start
or late before thread exit. One way out of this would be to ensure signals
are blocked at thread start/exit, but I can't find the code responsible for
doing this within glibc.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com


Re: [RESEND PATCH v3 2/2] PCI: uniphier: Add UniPhier PCIe host controller support

2018-11-23 Thread Lorenzo Pieralisi
On Tue, Nov 20, 2018 at 09:15:31PM +0900, Kunihiko Hayashi wrote:

[...]

> > > +static int uniphier_pcie_link_up(struct dw_pcie *pci)
> > 
> > This function returns a bool value, make it return a bool.
> 
> This function is registered in struct dw_pcie_ops.link_up, that is defined
> in pcie-designware.h as follows:
> 
>int (*link_up)(struct dw_pcie *pcie);
> 
> Changing the return type of this function makes type mismatch,
> so I think it's difficult to change it.

You are right, it is fine.

Thanks,
Lorenzo


[PATCH] find_bit_benchmark: align test_find_next_and_bit with others

2018-11-23 Thread Yury Norov
In contrary to other tests, the test_find_next_and_bit uses tab
formatting in output and get_cycles() instead of ktime_get().
get_cycles() is not supported by some arches, so ktime_get()
fits better in generic code.

Fix it and minor style issues, so the output looks like this:

Start testing find_bit() with random-filled bitmap
find_next_bit: 7142816 ns, 163282 iterations
find_next_zero_bit:8545712 ns, 164399 iterations
find_last_bit: 6332032 ns, 163282 iterations
find_first_bit:   20509424 ns,  16606 iterations
find_next_and_bit: 4060016 ns,  73424 iterations

Start testing find_bit() with sparse bitmap
find_next_bit:   55984 ns,656 iterations
find_next_zero_bit:   19197536 ns, 327025 iterations
find_last_bit:   65088 ns,656 iterations
find_first_bit:5923712 ns,656 iterations
find_next_and_bit:   29088 ns,  1 iterations

Signed-off-by: Yury Norov 
---
 lib/find_bit_benchmark.c | 11 +--
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/lib/find_bit_benchmark.c b/lib/find_bit_benchmark.c
index 5367ffa5c18f..f0e394dd2beb 100644
--- a/lib/find_bit_benchmark.c
+++ b/lib/find_bit_benchmark.c
@@ -108,14 +108,13 @@ static int __init test_find_next_and_bit(const void 
*bitmap,
const void *bitmap2, unsigned long len)
 {
unsigned long i, cnt;
-   cycles_t cycles;
+   ktime_t time;
 
-   cycles = get_cycles();
+   time = ktime_get();
for (cnt = i = 0; i < BITMAP_LEN; cnt++)
-   i = find_next_and_bit(bitmap, bitmap2, BITMAP_LEN, i+1);
-   cycles = get_cycles() - cycles;
-   pr_err("find_next_and_bit:\t\t%llu cycles, %ld iterations\n",
-   (u64)cycles, cnt);
+   i = find_next_and_bit(bitmap, bitmap2, BITMAP_LEN, i + 1);
+   time = ktime_get() - time;
+   pr_err("find_next_and_bit:  %18llu ns, %6ld iterations\n", time, cnt);
 
return 0;
 }
-- 
2.17.1



Re: [RFC PATCH v4 1/5] glibc: Perform rseq(2) registration at nptl init and thread creation

2018-11-23 Thread Rich Felker
On Fri, Nov 23, 2018 at 06:39:04PM +0100, Florian Weimer wrote:
> * Rich Felker:
> 
> > On Fri, Nov 23, 2018 at 12:05:20PM -0500, Mathieu Desnoyers wrote:
> >> There has been presumptions about signals being blocked when the thread
> >> exits throughout this email thread. Out of curiosity, what code is
> >> responsible for disabling signals in this situation ? Related to this,
> >> is it valid to access a IE model TLS variable from a signal handler at
> >> _any_ point where the signal handler nests over thread's execution ?
> >> This includes early start and just before invoking the exit system call.
> >
> > It should be valid to access *any* TLS object like this, but the
> > standards don't cover it well.
> 
> C++ makes it undefined:
> 
>   

C also leaves access to pretty much anything from a signal handler
undefined, but that makes signals basically useless. POSIX
inadvertently defines a lot more than it wanted to by ignoring
indirect ways you can access objects using AS-safe functions to pass
around their addresses; there's an open issue for this:

http://austingroupbugs.net/view.php?id=728

I think it's reasonable to say, based on how fond POSIX is of signals
for realtime stuff, that it should allow some reasonable operations,
but just be more careful about what it allows, and disallowing access
to TLS would preclude the only ways to make signals non-awful for
multithreaded processes.

Rich


Re: [PATCH] KVM: VMX: re-add ple_gap module parameter

2018-11-23 Thread Liran Alon



> On 23 Nov 2018, at 19:02, Luiz Capitulino  wrote:
> 
> 
> Apparently, the ple_gap parameter was accidentally removed
> by commit c8e88717cfc6b36bedea22368d97667446318291. Add it
> back.
> 
> Signed-off-by: Luiz Capitulino 

Weird that nobody noticed this when patch was applied… Thanks.
Reviewed-by: Liran Alon 

> ---
> arch/x86/kvm/vmx.c | 1 +
> 1 file changed, 1 insertion(+)
> 
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 4555077d69ce..be6f13f1c25f 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -174,6 +174,7 @@ module_param_named(preemption_timer, 
> enable_preemption_timer, bool, S_IRUGO);
>  * refer SDM volume 3b section 21.6.13 & 22.1.3.
>  */
> static unsigned int ple_gap = KVM_DEFAULT_PLE_GAP;
> +module_param(ple_gap, uint, 0444);
> 
> static unsigned int ple_window = KVM_VMX_DEFAULT_PLE_WINDOW;
> module_param(ple_window, uint, 0444);
> -- 
> 2.17.2
> 



Re: [PATCH] x86: only use ERMS for user copies for larger sizes

2018-11-23 Thread Linus Torvalds
On Fri, Nov 23, 2018 at 8:36 AM Linus Torvalds
 wrote:
>
> Let me write a generic routine in lib/iomap_copy.c (which already does
> the "user specifies chunk size" cases), and hook it up for x86.

Something like this?

ENTIRELY UNTESTED! It might not compile. Seriously. And if it does
compile, it might not work.

And this doesn't actually do the memset_io() function at all, just the
memcpy ones.

Finally, it's worth noting that on x86, we have this:

  /*
   * override generic version in lib/iomap_copy.c
   */
  ENTRY(__iowrite32_copy)
  movl %edx,%ecx
  rep movsd
  ret
  ENDPROC(__iowrite32_copy)

because back in 2006, we did this:

[PATCH] Add faster __iowrite32_copy routine for x86_64

This assembly version is measurably faster than the generic version in
lib/iomap_copy.c.

which actually implies that "rep movsd" is faster than doing
__raw_writel() by hand.

So it is possible that this should all be arch-specific code rather
than that butt-ugly "generic" code I wrote in this patch.

End result: I'm not really all that  happy about this patch, but it's
perhaps worth testing, and it's definitely worth discussing. Because
our current memcpy_{to,from}io() is truly broken garbage.

   Linus
 arch/x86/include/asm/io.h |   6 ++
 include/linux/io.h|   2 +
 lib/iomap_copy.c  | 153 ++
 3 files changed, 161 insertions(+)

diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h
index 832da8229cc7..3b9206ee25b8 100644
--- a/arch/x86/include/asm/io.h
+++ b/arch/x86/include/asm/io.h
@@ -92,6 +92,12 @@ build_mmio_write(__writel, "l", unsigned int, "r", )
 
 #define mmiowb() barrier()
 
+void __iowrite_copy(void __iomem *to, const void *from, size_t count);
+void __ioread_copy(void *to, const void __iomem *from, size_t count);
+
+#define memcpy_toio __iowrite_copy
+#define memcpy_fromio __ioread_copy
+
 #ifdef CONFIG_X86_64
 
 build_mmio_read(readq, "q", u64, "=r", :"memory")
diff --git a/include/linux/io.h b/include/linux/io.h
index 32e30e8fb9db..642f78970018 100644
--- a/include/linux/io.h
+++ b/include/linux/io.h
@@ -28,6 +28,8 @@
 struct device;
 struct resource;
 
+void __ioread_copy(void *to, const void __iomem *from, size_t count);
+void __iowrite_copy(void __iomem *to, const void *from, size_t count);
 __visible void __iowrite32_copy(void __iomem *to, const void *from, size_t count);
 void __ioread32_copy(void *to, const void __iomem *from, size_t count);
 void __iowrite64_copy(void __iomem *to, const void *from, size_t count);
diff --git a/lib/iomap_copy.c b/lib/iomap_copy.c
index b8f1d6cbb200..8edc359dda62 100644
--- a/lib/iomap_copy.c
+++ b/lib/iomap_copy.c
@@ -17,6 +17,159 @@
 
 #include 
 #include 
+#include 
+
+static inline bool iomem_align(const void __iomem *ptr, int size, int count)
+{
+	return count >= size && (__force unsigned long)ptr & size;
+}
+
+
+/**
+ * __iowrite_copy - copy data to MMIO space
+ * @to: destination, in MMIO space
+ * @from: source
+ * @count: number of bytes to copy.
+ *
+ * Copy arbitrarily aligned data from kernel space to MMIO space,
+ * using reasonable chunking.
+ */
+void __attribute__((weak)) __iowrite_copy(void __iomem *to,
+	  const void *from,
+	  size_t count)
+{
+	if (iomem_align(to, 1, count)) {
+		unsigned char data = *(unsigned char *)from;
+		__raw_writeb(data, to);
+		from++;
+		to++;
+		count--;
+	}
+	if (iomem_align(to, 2, count)) {
+		unsigned short data = get_unaligned((unsigned short *)from);
+		__raw_writew(data, to);
+		from += 2;
+		to += 2;
+		count -= 2;
+	}
+#ifdef CONFIG_64BIT
+	if (iomem_align(to, 4, count)) {
+		unsigned int data = get_unaligned((unsigned int *)from);
+		__raw_writel(data, to);
+		from += 4;
+		to += 4;
+		count -= 4;
+	}
+#endif
+	while (count >= sizeof(unsigned long)) {
+		unsigned long data = get_unaligned((unsigned long *)from);
+#ifdef CONFIG_64BIT
+		__raw_writeq(data, to);
+#else
+		__raw_writel(data, to);
+#endif
+		from += sizeof(unsigned long);
+		to += sizeof(unsigned long);
+		count -= sizeof(unsigned long);
+	}
+
+#ifdef CONFIG_64BIT
+	if (count >= 4) {
+		unsigned int data = get_unaligned((unsigned int *)from);
+		__raw_writel(data, to);
+		from += 4;
+		to += 4;
+		count -= 4;
+	}
+#endif
+
+	if (count >= 2) {
+		unsigned short data = get_unaligned((unsigned short *)from);
+		__raw_writew(data, to);
+		from += 2;
+		to += 2;
+		count -= 2;
+	}
+
+	if (count) {
+		unsigned char data = *(unsigned char *)from;
+		__raw_writeb(data, to);
+	}
+}
+EXPORT_SYMBOL_GPL(__iowrite_copy);
+
+/**
+ * __ioread_copy - copy data from MMIO space
+ * @to: destination
+ * @from: source, in MMIO space
+ * @count: number of bytes to copy.
+ *
+ * Copy arbitrarily aligned data from MMIO space to kernel space,
+ * using reasonable chunking.
+ */
+void __attribute__((weak)) __ioread_copy(void *to,
+	 const void __iomem *from,
+	 size_t count)
+{
+	if (iomem_align(from, 1, count)) {
+		unsigned char data =

Greetings from Mr Noble

2018-11-23 Thread Abbott Noble
Greetings to you and your family.

 My name is Mr. Noble Abbott, the director general with the bank,
Africa Develop bank (ADB) Ouagadougou, Burkina Faso, in West Africa. I
am contacting you to seek our honesty and sincere cooperation in
confidential manner to transfer the sum of 10.5 (Ten million five
hundred thousand Dollars) to your existing or new bank account.

 This money belongs to one of our bank client, a Libyan oil exporter
who was working with the former Libyan government; I learn t that he
was killed by the revolutionary forces since October 2011. Our bank is
planning to transfer this entire fund into the government public
treasury as unclaimed fund if nobody comes to claim the money from our
bank after four years without account activities .

 We did not know each other before, but due to the fact that the
deceased is a foreigner, the bank will welcome any claim from a
foreigner without any suspect, that is why I decided to look for
someone whim I can trust to come and claim the fund from our bank.

 I will endorse your name in the deceased client file here in my
office which will indicate to that the deceased is your legal joint
account business partner or family member next of kin to the deceased
and officially the bank will transfer the fund to your bank account
within seven working days in accordance to our banking inheritance
rules and fund claim regulation.

 I will share 40% for you and 60% for me after the fund is transferred
to your bank account, we need to act fast to complete this transaction
within seven days. I will come to your country to collect my share
after the fund is transferred to your bank account in your country. I
hope that you will not disappoint me after the fund is transferred to
your bank account in your country.

 Please I want you to send me your private phone number so that I can
call you to discuss more details on how we can proceed on this project

 Waiting for your urgent response today
 Yours sincerely


Re: [RFC PATCH v4 1/5] glibc: Perform rseq(2) registration at nptl init and thread creation

2018-11-23 Thread Florian Weimer
* Rich Felker:

> On Fri, Nov 23, 2018 at 12:05:20PM -0500, Mathieu Desnoyers wrote:
>> There has been presumptions about signals being blocked when the thread
>> exits throughout this email thread. Out of curiosity, what code is
>> responsible for disabling signals in this situation ? Related to this,
>> is it valid to access a IE model TLS variable from a signal handler at
>> _any_ point where the signal handler nests over thread's execution ?
>> This includes early start and just before invoking the exit system call.
>
> It should be valid to access *any* TLS object like this, but the
> standards don't cover it well.

C++ makes it undefined:

  

Thanks,
Florian


[PATCH] clk: davinci: check for devm_kasprintf() failure

2018-11-23 Thread Nicholas Mc Guire
devm_kasprintf() may return NULL on failure of internal allocation thus
the assignment to  lpsc->pm_domain.name  is not safe if not checked. On 
error davinci_lpsc_clk_register() returns a pointer to davinci_lpsc_clk
which is checked with IS_ERR() so returning ERR_PTR(-ENOMEM) should be 
fine here.

Signed-off-by: Nicholas Mc Guire 
Fixes: c6ed4d734bc7 ("clk: davinci: New driver for davinci PSC clocks")
---

Problem located with experimental coccinelle script

Patch was compile tested with: davinci_all_defconfig

Patch is against 4.20-rc3 (localversion-next is next-20181123)

 drivers/clk/davinci/psc.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/clk/davinci/psc.c b/drivers/clk/davinci/psc.c
index 5b69e24..ca9b5c3 100644
--- a/drivers/clk/davinci/psc.c
+++ b/drivers/clk/davinci/psc.c
@@ -278,6 +278,11 @@ davinci_lpsc_clk_register(struct device *dev, const char 
*name,
 
lpsc->pm_domain.name = devm_kasprintf(dev, GFP_KERNEL, "%s: %s",
  best_dev_name(dev), name);
+   if (!lpsc->pm_domain.name) {
+   kfree(lpsc);
+   return ERR_PTR(-ENOMEM);
+   }
+
lpsc->pm_domain.attach_dev = davinci_psc_genpd_attach_dev;
lpsc->pm_domain.detach_dev = davinci_psc_genpd_detach_dev;
lpsc->pm_domain.flags = GENPD_FLAG_PM_CLK;
-- 
2.1.4



Re: [PATCH 2/2] page cache: Store only head pages in i_pages

2018-11-23 Thread Kirill A. Shutemov
On Fri, Nov 23, 2018 at 09:19:00AM -0800, Matthew Wilcox wrote:
> On Fri, Nov 23, 2018 at 01:56:44PM +0300, Kirill A. Shutemov wrote:
> > On Thu, Nov 22, 2018 at 01:32:24PM -0800, Matthew Wilcox wrote:
> > > Transparent Huge Pages are currently stored in i_pages as pointers to
> > > consecutive subpages.  This patch changes that to storing consecutive
> > > pointers to the head page in preparation for storing huge pages more
> > > efficiently in i_pages.
> > 
> > I probably miss something, I don't see how it wouldn't break
> > split_huge_page().
> > 
> > I don't see what would replace head pages in i_pages with
> > formerly-tail-pages?
> 
> You're quite right.  Where's your test-suite?  ;-)

Yeah-yeah...

> I think this should do the job:
> 
> +++ b/mm/huge_memory.c
> @@ -2464,6 +2464,9 @@ static void __split_huge_page(struct page *page, struct 
> list_head *list,
> if (IS_ENABLED(CONFIG_SHMEM) && PageSwapBacked(head))
> shmem_uncharge(head->mapping->host, 1);
> put_page(head + i);
> +   } else if (!PageAnon(page)) {
> +   __xa_store(&head->mapping->i_pages, head[i].index,
> +   head + i, 0);
> }
> }

Looks good to me. But I still need to look into the rest of the patch.

> Having looked at this area, I think there was actually a bug in the patch
> you wrote that I'm cribbing from.  You inserted the tail pages before
> calling __split_huge_page_tail(), so a racing lookup would have found
> a tail page before it got transformed into a non-tail page.

I don't think so.

The page still has refcount==0 and any lookup of the page suppose to fail
due to !page_cache_get_speculative() or block on tree lock.

-- 
 Kirill A. Shutemov


Re: [RFC][PATCH 02/14] fgraph: Have set_graph_notrace only affect function_graph tracer

2018-11-23 Thread Steven Rostedt
On Fri, 23 Nov 2018 09:01:18 +0900
Namhyung Kim  wrote:

> Acked-by: Namhyung Kim 

Thanks Namhyung!

-- Steve


My Greetings

2018-11-23 Thread Mrs. Marianne Jeanne
Beloved,
I am writing this mail to you with heavy tears in my eyes and great
sorrow in my heart.  As I informed you earlier, I am (Mrs.)Marianne 
Jeanne, 
suffering from long time Cancer. From all indications 
my condition is really deteriorating and it's quite obvious 
that I won't live more than 2 months according to my doctors.

I have some funds I inherited from my late loving husband Mr. Jeanne
Smith, the sum of (€8.5000.000) which he deposited in a Bank. I need
a very honest and God fearing person that can use these funds for
Charity work, helping the Less Privileges, and 20% of this money will
be for your time and expenses, while 80% goes to charities.

Please let me know if I can TRUST YOU ON THIS to carry out this favour
for me.  I look forward to your prompt reply for more details.

Yours sincerely
Marianne Jeanne


Re: [RFC PATCH v4 1/5] glibc: Perform rseq(2) registration at nptl init and thread creation

2018-11-23 Thread Rich Felker
On Fri, Nov 23, 2018 at 12:05:20PM -0500, Mathieu Desnoyers wrote:
> - On Nov 23, 2018, at 9:28 AM, Rich Felker dal...@libc.org wrote:
> [...]
> > 
> > Absolutely. As long as it's in libc, implicit destruction will happen.
> > Actually I think the glibc code shound unconditionally unregister the
> > rseq address at exit (after blocking signals, so no application code
> > can run) in case a third-party rseq library was linked and failed to
> > do so before thread exit (e.g. due to mismatched ref counts) rather
> > than respecting the reference count, since it knows it's the last
> > user. This would make potentially-buggy code safer.
> 
> OK, let me go ahead with a few ideas/questions along that path.
 ^^^
> 
> Let's say our stated goal is to let the "exit" system call from the
> glibc thread exit path perform rseq unregistration (without explicit
> unregistration beforehand). Let's look at what we need.

This is not "along that path". The above-quoted text is not about
assuming it's safe to make SYS_exit without unregistering the rseq
object, but rather about glibc being able to perform the
rseq-unregister syscall without caring about reference counts, since
it knows no other code that might depend on rseq can run after it.

> First, we need the TLS area to be valid until the exit system call
> is invoked by the thread. If glibc defines __rseq_abi as a weak symbol,
> I'm not entirely sure we can guarantee the IE model if another library
> gets its own global-dynamic weak symbol elected at execution time. Would
> it be better to switch to a "strong" symbol for the glibc __rseq_abi
> rather than weak ?

This doesn't help; still whichever comes first in link order would
override. Either way __rseq_abi would be in static TLS, though,
because any dynamically-loaded library is necessarily loaded after
libc, which is loaded at initial exec time.

> There has been presumptions about signals being blocked when the thread
> exits throughout this email thread. Out of curiosity, what code is
> responsible for disabling signals in this situation ? Related to this,
> is it valid to access a IE model TLS variable from a signal handler at
> _any_ point where the signal handler nests over thread's execution ?
> This includes early start and just before invoking the exit system call.

It should be valid to access *any* TLS object like this, but the
standards don't cover it well. Right now access to dynamic TLS from
signal handlers is unsafe in glibc, but static is safe.

Rich


[PATCH] arm64/bpf: use movn/movk/movk sequence to generate kernel addresses

2018-11-23 Thread Ard Biesheuvel
On arm64, all executable code is guaranteed to reside in the vmalloc
space (or the module space), and so jump targets will only use 48
bits at most, and the remaining bits are guaranteed to be 0x1.

This means we can generate an immediate jump address using a sequence
of one MOVN (move wide negated) and two MOVK instructions, where the
first one sets the lower 16 bits but also sets all top bits to 0x1.

Signed-off-by: Ard Biesheuvel 
---

I looked into using ADRP/ADD pairs, but this is very fiddly, since
it requires knowledge about where the ADRP instruction ends up in
memory. (ADRP produces a PC-relative address with bits [11:0] cleared,
and so in addition to the distance between the instruction and the
target, we also need to know their offsets modulo 4096 and I wasn't
sure whether the offsets are guaranteed to be relative to the start
of a page or not)

 arch/arm64/net/bpf_jit_comp.c | 16 ++--
 1 file changed, 6 insertions(+), 10 deletions(-)

diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c
index a6fdaea07c63..3b4d2c6fc133 100644
--- a/arch/arm64/net/bpf_jit_comp.c
+++ b/arch/arm64/net/bpf_jit_comp.c
@@ -134,10 +134,9 @@ static inline void emit_a64_mov_i64(const int reg, const 
u64 val,
 }
 
 /*
- * This is an unoptimized 64 immediate emission used for BPF to BPF call
- * addresses. It will always do a full 64 bit decomposition as otherwise
- * more complexity in the last extra pass is required since we previously
- * reserved 4 instructions for the address.
+ * Kernel addresses in the vmalloc space use at most 48 bits, and the
+ * remaining bits are guaranteed to be 0x1. So we can compose the address
+ * with a fixed length movn/movk/movk sequence.
  */
 static inline void emit_addr_mov_i64(const int reg, const u64 val,
 struct jit_ctx *ctx)
@@ -145,8 +144,8 @@ static inline void emit_addr_mov_i64(const int reg, const 
u64 val,
u64 tmp = val;
int shift = 0;
 
-   emit(A64_MOVZ(1, reg, tmp & 0x, shift), ctx);
-   for (;shift < 48;) {
+   emit(A64_MOVN(1, reg, ~tmp & 0x, shift), ctx);
+   while (shift < 32) {
tmp >>= 16;
shift += 16;
emit(A64_MOVK(1, reg, tmp & 0x, shift), ctx);
@@ -627,10 +626,7 @@ static int build_insn(const struct bpf_insn *insn, struct 
jit_ctx *ctx)
const u8 r0 = bpf2a64[BPF_REG_0];
const u64 func = (u64)__bpf_call_base + imm;
 
-   if (ctx->prog->is_func)
-   emit_addr_mov_i64(tmp, func, ctx);
-   else
-   emit_a64_mov_i64(tmp, func, ctx);
+   emit_addr_mov_i64(tmp, func, ctx);
emit(A64_BLR(tmp), ctx);
emit(A64_MOV(1, r0, A64_R(0)), ctx);
break;
-- 
2.19.1



Re: [PATCH 1/2] module: Overwrite st_size instead of st_info

2018-11-23 Thread Dave Martin
On Thu, Nov 22, 2018 at 05:49:23PM +, Russell King - ARM Linux wrote:
> On Thu, Nov 22, 2018 at 06:40:45PM +0100, Ard Biesheuvel wrote:
> > On Thu, 22 Nov 2018 at 17:29, Jessica Yu  wrote:
> > >
> > > +++ Vincent Whitchurch [22/11/18 13:24 +0100]:
> > > >On Thu, Nov 22, 2018 at 12:01:54PM +, Dave Martin wrote:
> > > >> On Mon, Nov 19, 2018 at 05:25:12PM +0100, Vincent Whitchurch wrote:
> > > >> > st_info is currently overwritten after relocation and used to store 
> > > >> > the
> > > >> > elf_type().  However, we're going to need it fix kallsyms on ARM's
> > > >> > Thumb-2 kernels, so preserve st_info and overwrite the st_size field
> > > >> > instead.  st_size is neither used by the module core nor by any
> > > >> > architecture.
> > > >> >
> > > >> > Signed-off-by: Vincent Whitchurch 
> > > >> > ---
> > > >> > v4: Split out to separate patch.  Use st_size instead of st_other.
> > > >> > v1-v3: See PATCH 2/2
> > > >> >
> > > >> >  kernel/module.c | 4 ++--
> > > >> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > > >> >
> > > >> > diff --git a/kernel/module.c b/kernel/module.c
> > > >> > index 49a405891587..3d86a38b580c 100644
> > > >> > --- a/kernel/module.c
> > > >> > +++ b/kernel/module.c
> > > >> > @@ -2682,7 +2682,7 @@ static void add_kallsyms(struct module *mod, 
> > > >> > const struct load_info *info)
> > > >> >
> > > >> >/* Set types up while we still have access to sections. */
> > > >> >for (i = 0; i < mod->kallsyms->num_symtab; i++)
> > > >> > -  mod->kallsyms->symtab[i].st_info
> > > >> > +  mod->kallsyms->symtab[i].st_size
> > > >> >= elf_type(&mod->kallsyms->symtab[i], info);
> > > >> >
> > > >> >/* Now populate the cut down core kallsyms for after init. */
> > > >> > @@ -4061,7 +4061,7 @@ int module_get_kallsym(unsigned int symnum, 
> > > >> > unsigned long *value, char *type,
> > > >> >kallsyms = rcu_dereference_sched(mod->kallsyms);
> > > >> >if (symnum < kallsyms->num_symtab) {
> > > >> >*value = kallsyms->symtab[symnum].st_value;
> > > >> > -  *type = kallsyms->symtab[symnum].st_info;
> > > >> > +  *type = kallsyms->symtab[symnum].st_size;
> > > >> >strlcpy(name, symname(kallsyms, symnum), 
> > > >> > KSYM_NAME_LEN);
> > > >> >strlcpy(module_name, mod->name, MODULE_NAME_LEN);
> > > >> >*exported = is_exported(name, *value, mod);
> > > >>
> > > >> This is fine if st_size is really unused, but how sure are you of that?
> > > >>
> > > >> grepping for st_size throws up some hits that appear ELF-related, but
> > > >> I've not investigated them in detail.
> > > >>
> > > >> (The fact that struct stat has an identically named field throws up
> > > >> a load of false positives too.)
> > > >
> > > >$ git describe --tags
> > > >v4.20-rc3-93-g92b419289cee
> > > >
> > > >$ rg -m1 '[\.>]st_size'  --iglob '!**/tools/**' --iglob '!**/vdso*' 
> > > >--iglob '!**/scripts/**' --iglob '!**/usr/**' --iglob '!**/samples/**' | 
> > > >cat
> > > >| kernel/kexec_file.c: if (sym->st_size != size) {
> > > >
> > > >Symbols in kexec kernel.
> > > >
> > > >| fs/stat.c:   tmp.st_size = stat->size;
> > > >| Documentation/networking/tls.txt:  sendfile(sock, file, &offset, 
> > > >stat.st_size);
> > > >| net/9p/client.c: ret->st_rdev, ret->st_size, 
> > > >ret->st_blksize,
> > > >| net/9p/protocol.c:   &stbuf->st_rdev, 
> > > >&stbuf->st_size,
> > > >| fs/9p/vfs_inode_dotl.c:  i_size_write(inode, 
> > > >stat->st_size);
> > > >| fs/hostfs/hostfs_user.c: p->size = buf->st_size;
> > > >| arch/powerpc/boot/mktree.c:  nblks = (st.st_size + IMGBLK) / IMGBLK;
> > > >| arch/alpha/kernel/osf_sys.c: tmp.st_size = lstat->size;
> > > >| arch/x86/ia32/sys_ia32.c:__put_user(stat->size, 
> > > >&ubuf->st_size) ||
> > > >
> > > >Not Elf_Sym.
> > > >
> > > >| arch/x86/kernel/machine_kexec_64.c:   sym->st_size);
> > > >
> > > >Symbols in kexec kernel.
> > > >
> > > >| arch/sparc/boot/piggyback.c: st4(buffer + 12, s.st_size);
> > > >| arch/sparc/kernel/sys_sparc32.c: err |= put_user(stat->size, 
> > > >&statbuf->st_size);
> > > >| arch/um/os-Linux/file.c: .ust_size= src->st_size,
> > > >/* total size, in bytes */
> > > >| arch/um/os-Linux/start_up.c: size = (buf.st_size + UM_KERN_PAGE_SIZE) 
> > > >& ~(UM_KERN_PAGE_SIZE - 1);
> > > >| arch/s390/kernel/compat_linux.c: tmp.st_size = stat->size;
> > > >| arch/arm/kernel/sys_oabi-compat.c:   tmp.st_size = stat->size;
> > > >| arch/mips/boot/compressed/calc_vmlinuz_load_addr.c:  vmlinux_size = 
> > > >(uint64_t)sb.st_size;
> > > >| drivers/net/ethernet/marvell/sky2.c: hw->st_idx = 
> > > >RING_NEXT(hw->st_idx, hw->st_size);
> > > >
> > > >Not Elf_Sym.
> > >
> > > [ added Miroslav to CC, just in case he would like to check :) ]
> > >
> > > I have just double checked as well, a

Re: [PATCH 2/2] page cache: Store only head pages in i_pages

2018-11-23 Thread Matthew Wilcox
On Fri, Nov 23, 2018 at 01:56:44PM +0300, Kirill A. Shutemov wrote:
> On Thu, Nov 22, 2018 at 01:32:24PM -0800, Matthew Wilcox wrote:
> > Transparent Huge Pages are currently stored in i_pages as pointers to
> > consecutive subpages.  This patch changes that to storing consecutive
> > pointers to the head page in preparation for storing huge pages more
> > efficiently in i_pages.
> 
> I probably miss something, I don't see how it wouldn't break
> split_huge_page().
> 
> I don't see what would replace head pages in i_pages with
> formerly-tail-pages?

You're quite right.  Where's your test-suite?  ;-)

I think this should do the job:

+++ b/mm/huge_memory.c
@@ -2464,6 +2464,9 @@ static void __split_huge_page(struct page *page, struct 
list_head *list,
if (IS_ENABLED(CONFIG_SHMEM) && PageSwapBacked(head))
shmem_uncharge(head->mapping->host, 1);
put_page(head + i);
+   } else if (!PageAnon(page)) {
+   __xa_store(&head->mapping->i_pages, head[i].index,
+   head + i, 0);
}
}
 

Having looked at this area, I think there was actually a bug in the patch
you wrote that I'm cribbing from.  You inserted the tail pages before
calling __split_huge_page_tail(), so a racing lookup would have found
a tail page before it got transformed into a non-tail page.


Re: [PATCH] exec: separate MM_ANONPAGES and RLIMIT_STACK accounting

2018-11-23 Thread Guenter Roeck
Hi,

On Mon, Nov 12, 2018 at 05:09:10PM +0100, Oleg Nesterov wrote:
> get_arg_page() checks bprm->rlim_stack.rlim_cur and re-calculates the
> "extra" size for argv/envp pointers every time, this is a bit ugly and
> even not strictly correct: acct_arg_size() must not account this size.
> 
> Remove all the rlimit code in get_arg_page(). Instead, add bprm->argmin
> calculated once at the start of __do_execve_file() and change copy_strings
> to check bprm->p >= bprm->argmin.
> 
> The patch adds the new helper, prepare_arg_pages() which initializes
> bprm->argc/envc and bprm->argmin.
> 
> Signed-off-by: Oleg Nesterov 
> Acked-by: Kees Cook 

This patch results in various qemu boot failures in -next. Bisect logs
are attached. It looks like all nommu boots are failing. I bisected an385
and m68k, but xtensa (kc705-nommu) fails with the same error.

[5.416926] Run /sbin/init as init process
[5.419792] Failed to execute /sbin/init (error -7)
[5.421128] Run /sbin/init as init process
[5.423284] Starting init: /sbin/init exists but couldn't execute it (error 
-7)
[5.424747] Run /etc/init as init process
[5.428507] Run /bin/init as init process
[5.430478] Run /bin/sh as init process
[5.433179] Starting init: /bin/sh exists but couldn't execute it (error -7)

Guenter

---
an385:

# bad: [8c9733fd9806c71e7f2313a280f98cb3051f93df] Add linux-next specific files 
for 20181123
# good: [9ff01193a20d391e8dbce4403dd5ef87c7eaaca6] Linux 4.20-rc3
git bisect start 'HEAD' 'v4.20-rc3'
# good: [34c2101b4f765edf1b91c2837da9c60fbf9f6912] Merge remote-tracking branch 
'spi-nor/spi-nor/next'
git bisect good 34c2101b4f765edf1b91c2837da9c60fbf9f6912
# good: [15367a0657fc8027ff3466bf0202bde9f270259b] Merge remote-tracking branch 
'kgdb/kgdb-next'
git bisect good 15367a0657fc8027ff3466bf0202bde9f270259b
# good: [d29686ab179c34c5dbaac067a9effbeeb6a8073e] Merge remote-tracking branch 
'soundwire/next'
git bisect good d29686ab179c34c5dbaac067a9effbeeb6a8073e
# good: [3381311f9a261c9d2863f0d52e9499c88ccb1f44] Merge remote-tracking branch 
'cgroup/for-next'
git bisect good 3381311f9a261c9d2863f0d52e9499c88ccb1f44
# good: [b5bf9c4c28a25b74475c3f3d3fe6d5d737629ab7] Merge remote-tracking branch 
'pinctrl/for-next'
git bisect good b5bf9c4c28a25b74475c3f3d3fe6d5d737629ab7
# good: [a75f6952444a401f5464e977caff2a84118b66c9] fs/epoll: deal with 
wait_queue only once
git bisect good a75f6952444a401f5464e977caff2a84118b66c9
# good: [d83de16290eefb4372520a61744b11bddb6e8f7e] Merge remote-tracking branch 
'slimbus/for-next'
git bisect good d83de16290eefb4372520a61744b11bddb6e8f7e
# good: [f2d2d58687c8300fe4ebf09d78b26cd68c825df7] Merge remote-tracking branch 
'xarray/xarray'
git bisect good f2d2d58687c8300fe4ebf09d78b26cd68c825df7
# bad: [eac22eb65c3cc96d8cb2c7aac7dd816030a86955] ipc: allow boot time 
extension of IPCMNI from 32k to 8M
git bisect bad eac22eb65c3cc96d8cb2c7aac7dd816030a86955
# good: [fb95dde314b136e9598a56350b3de3994d272486] exec: increase 
BINPRM_BUF_SIZE to 256
git bisect good fb95dde314b136e9598a56350b3de3994d272486
# bad: [fb704fe633a3ca80a68282d5c3c665c4382f500b] 
exec-separate-mm_anonpages-and-rlimit_stack-accounting-checkpatch-fixes
git bisect bad fb704fe633a3ca80a68282d5c3c665c4382f500b
# bad: [eed684faf14610c2063c1f03e0a1f5ef56f23bb4] exec: separate MM_ANONPAGES 
and RLIMIT_STACK accounting
git bisect bad eed684faf14610c2063c1f03e0a1f5ef56f23bb4
# first bad commit: [eed684faf14610c2063c1f03e0a1f5ef56f23bb4] exec: separate 
MM_ANONPAGES and RLIMIT_STACK accounting

---
m68k:

# bad: [8c9733fd9806c71e7f2313a280f98cb3051f93df] Add linux-next specific files 
for 20181123
# good: [9ff01193a20d391e8dbce4403dd5ef87c7eaaca6] Linux 4.20-rc3
git bisect start 'HEAD' 'v4.20-rc3'
# good: [34c2101b4f765edf1b91c2837da9c60fbf9f6912] Merge remote-tracking branch 
'spi-nor/spi-nor/next'
git bisect good 34c2101b4f765edf1b91c2837da9c60fbf9f6912
# good: [15367a0657fc8027ff3466bf0202bde9f270259b] Merge remote-tracking branch 
'kgdb/kgdb-next'
git bisect good 15367a0657fc8027ff3466bf0202bde9f270259b
# good: [d29686ab179c34c5dbaac067a9effbeeb6a8073e] Merge remote-tracking branch 
'soundwire/next'
git bisect good d29686ab179c34c5dbaac067a9effbeeb6a8073e
# good: [3381311f9a261c9d2863f0d52e9499c88ccb1f44] Merge remote-tracking branch 
'cgroup/for-next'
git bisect good 3381311f9a261c9d2863f0d52e9499c88ccb1f44
# good: [b5bf9c4c28a25b74475c3f3d3fe6d5d737629ab7] Merge remote-tracking branch 
'pinctrl/for-next'
git bisect good b5bf9c4c28a25b74475c3f3d3fe6d5d737629ab7
# good: [a75f6952444a401f5464e977caff2a84118b66c9] fs/epoll: deal with 
wait_queue only once
git bisect good a75f6952444a401f5464e977caff2a84118b66c9
# good: [d83de16290eefb4372520a61744b11bddb6e8f7e] Merge remote-tracking bra

Re: [PATCH 0/7] ACPI HMAT memory sysfs representation

2018-11-23 Thread Dan Williams
On Thu, Nov 22, 2018 at 11:11 PM Anshuman Khandual
 wrote:
>
>
>
> On 11/22/2018 11:38 PM, Dan Williams wrote:
> > On Thu, Nov 22, 2018 at 3:52 AM Anshuman Khandual
> >  wrote:
> >>
> >>
> >>
> >> On 11/19/2018 11:07 PM, Dave Hansen wrote:
> >>> On 11/18/18 9:44 PM, Anshuman Khandual wrote:
>  IIUC NUMA re-work in principle involves these functional changes
> 
>  1. Enumerating compute and memory nodes in heterogeneous environment 
>  (short/medium term)
> >>>
> >>> This patch set _does_ that, though.
> >>>
>  2. Enumerating memory node attributes as seen from the compute nodes 
>  (short/medium term)
> >>>
> >>> It does that as well (a subset at least).
> >>>
> >>> It sounds like the subset that's being exposed is insufficient for yo
> >>> We did that because we think doing anything but a subset in sysfs will
> >>> just blow up sysfs:  MAX_NUMNODES is as high as 1024, so if we have 4
> >>> attributes, that's at _least_ 1024*1024*4 files if we expose *all*
> >>> combinations.
> >> Each permutation need not be a separate file inside all possible NODE X
> >> (/sys/devices/system/node/nodeX) directories. It can be a top level file
> >> enumerating various attribute values for a given (X, Y) node pair based
> >> on an offset something like /proc/pid/pagemap.
> >>
> >>>
> >>> Do we agree that sysfs is unsuitable for exposing attributes in this 
> >>> manner?
> >>>
> >>
> >> Yes, for individual files. But this can be worked around with an offset
> >> based access from a top level global attributes file as mentioned above.
> >> Is there any particular advantage of using individual files for each
> >> given attribute ? I was wondering that a single unsigned long (u64) will
> >> be able to pack 8 different attributes where each individual attribute
> >> values can be abstracted out in 8 bits.
> >
> > sysfs has a 4K limit, and in general I don't think there is much
> > incremental value to go describe the entirety of the system from sysfs
> > or anywhere else in the kernel for that matter. It's simply too much> 
> > information to reasonably consume. Instead the kernel can describe the
>
> I agree that it may be some amount of information to parse but is crucial
> for any task on a heterogeneous system to evaluate (probably re-evaluate
> if the task moves around) its memory and CPU binding at runtime to make
> sure it has got the right one.

Can you provide some more evidence for this statement? It seems that
not many applications even care about basic numa let alone specific
memory targeting, at least according to libnumactl users.

 dnf repoquery --whatrequires numactl-libs

The kernel is the arbiter of memory, something is broken if
applications *need* to take on this responsibility. Yes, there will be
applications that want to tune and override the default kernel
behavior, but this is the exception, not the rule. The applications
that tend to care about specific memories also tend to be purpose
built for a given platform, and that lessens their reliance on the
kernel to enumerate all properties.

> > coarse boundaries and some semblance of "best" access initiator for a
> > given target. That should cover the "80%" case of what applications
>
> The current proposal just assumes that the best one is the nearest one.
> This may be true for bandwidth and latency but may not be true for some
> other properties. This assumptions should not be there while defining
> new ABI.

In fact, I tend to agree with you, but in my opinion that's an
argument to expose even less, not more. If we start with something
minimal that can be extended over time we lessen the risk of over
exposing details that don't matter in practice.

We're in the middle of a bit of a disaster with the VmFlags export in
/proc/$pid/smaps precisely because the implementation was too
comprehensive and applications started depending on details that the
kernel does not want to guarantee going forward. So there is a real
risk of being too descriptive in an interface design.

> > want to discover, for the other "20%" we likely need some userspace
> > library that can go parse these platform specific information sources
> > and supplement the kernel view. I also think a simpler kernel starting
> > point gives us room to go pull in more commonly used attributes if it
> > turns out they are useful, and avoid going down the path of exporting
> > attributes that have questionable value in practice.
> >
>
> Applications can just query platform information right now and just use
> them for mbind() without requiring this new interface.

No, they can't today, at least not for the topology details that HMAT
is describing. The platform-firmware to numa-node translation is
currently not complete. At a minimum we need a listing of initiator
ids and target ids. For an ACPI platform that is the proximity-domain
to numa-node-id translation information. Once that translation is in
place then a userspace library can consult the platform-specific
infor

Re: [for-next][PATCH 11/18] s390/function_graph: Simplify with function_graph_entry()

2018-11-23 Thread Steven Rostedt
On Thu, 22 Nov 2018 07:43:19 +0100
Martin Schwidefsky  wrote:

> A quick test showed that the patch series works fine on s390.
> Acked-by: Martin Schwidefsky 

Thanks for testing, and the ack!

-- Steve


Re: [for-next][PATCH 08/18] parisc: function_graph: Simplify with function_graph_entry()

2018-11-23 Thread Steven Rostedt
On Fri, 23 Nov 2018 10:06:05 +0100
Helge Deller  wrote:


> > How should we proceed with this patch?  
> 
> My suggestion, although I didn't looked too much on it:
> Apply it to v4.9 and higher only.
> I think I started fixing trace functionality on parisc around 4.6,
> which is probably why applying it fails on v4.4 and v3.x

The problem is, if you backport the generic patches, it will completely
break any arch that isn't updated. This also includes the archs that
are no longer supported upstream, as they were not changed to handle
the generic updates either.

-- Steve


Re: [PATCH v3 2/2] proc: add /proc//arch_state

2018-11-23 Thread Dave Martin
On Thu, Nov 22, 2018 at 09:40:24AM +0800, Li, Aubrey wrote:
> On 2018/11/21 17:53, Peter Zijlstra wrote:
> > On Wed, Nov 21, 2018 at 09:19:36AM +0100, Peter Zijlstra wrote:
> >> On Wed, Nov 21, 2018 at 09:39:00AM +0800, Li, Aubrey wrote:
>  Also; you were going to shop around with the other architectures to see
>  what they want/need for this interface. I see nothing on that.
> 
> >>> I'm open for your suggestion, :)
> >>
> >> Well, we have linux-arch and the various maintainers are also listed in
> >> MAINTAINERS. Go forth and ask..
> > 
> > Ok, so I googled a wee bit (you could have too).
> > 
> > There's not that many architectures that build big hot chips
> > (powerpc,x86,arm64,s390) (mips, sparc64 and ia64 are pretty dead I
> > think, although the Fujitsu Sparc M10 X+/X SIMD looked like it could be
> > 'fun').
> > 
> > Of those, powerpc altivec doesn't seem to be very wide, but you'd have
> > to ask the power folks. Same for s390 z13.
> > 
> > The Fujitsu/ARM64-SVE stuff looks like it can be big and hot.
> > 
> > And RISC-V has was vector extention, but I don't think anybody is
> > actually building big hot versions of that just yet.
> > 
> Thanks Peter. Add more maintainers here.
> 
> On some x86 architectures, the tasks using simd instruction(AVX512 
> particularly)
> need to be dealt with specially against the tasks not using simd instruction.
> I proposed an interface to expose such CPU specific information for the user
> space tools to apply different scheduling policies.
> 
> The interface can be refined to be the format as /proc//status. Not sure
> if it's useful to any other architectures.
> 
> Welcome any comments.

For SVE:

We currently monitor SVE use by trapping only.  We also made an ABI
decision that a syscall throws away the task's SVE state -- this
falls out naturally from the fact that the SVE state is caller-save
for regular function calls in the AArch64 ABI.

There isn't an explicit means like VZEROUPPER for userspace to
mark the SVE state as non-live without entering the kernel today.

Currently I expose as little detail to userspace as possible regarding
how/when SVE is enabled/disabled or used.


For the /proc interface:

It would be nice to expose some information to userspace about when/
where major hardware functional units are in use, but beyond the
information already supplied by hardware perf events, it's not
obvious what should be exposed.

AFAICT, the exposed flags would be partly an arbitrary artifact of
kernel implementation details: i.e., how often and when the kernel
saves/restores the task's state may affect the pattern of observed
values in non-trivial ways.

For SVE today, a task that does a lot of syscalls may appear to be using
SVE less than a second task that does fewer syscalls but is otherwise
identical -- simply because a syscall is our only way to detect that
SVE is not in use today.


This kind of issue means that userspace may struggle to make good
decisions using this data: instead it's going to rely on some kind of
tuning which may become wrong as soon as the workload, kernel version
or hardware changes.


A /proc//file would need to be polled (which doesn't sound great)
and also suffers from all the usual /proc raciness.

Cheers
---Dave


Re: [PATCH v1 3/3] arm64: dts: mt7622: Drop the general purpose timer node

2018-11-23 Thread Matthias Brugger



On 12/11/2018 02:28, Ryder Lee wrote:
> The crash http://termbin.com/zitb is caused by the timer register
> into system in early pahse during kernel boot, but the clock
> sources didn't get ready at that time.
> 
> A better way is to switch to use CLK_OF_DECLARE() in driver for things
> that need them early, but this node is actually useless in MT7622.
> So we drop it.

I suppose the root cause is, that the driver doesn't check the error
timer_of_init returned.

Would you mind to test the following patch to see if this fixes the problem?

--->8
>From be91e56ed527261137335af340845eb3dd3dd33a Mon Sep 17 00:00:00 2001
From: Matthias Brugger 
Date: Fri, 23 Nov 2018 17:04:08 +0100
Subject: [PATCH] clocksource/drivers/timer-mediatek: error out on probe defer

If the clocks are not yet present, because they are loaded after
the timer is initialized, we are unable to boot.
Check if the return value of timer_of_init and return on EPROBE_DEFER.

Error seen is:
[0.008337] Failed to get clock for /timer@10004000
[0.013435] WARNING: CPU: 0 PID: 0 at ../drivers/clk/clk.c:3615
__clk_put+0xf0/0x120
[0.021453] Modules linked in:
[0.024614] CPU: 0 PID: 0 Comm: swapper/0 Not tainted
4.20.0-rc1-00063-g2859d9abd7e8 #2
[0.032902] Hardware name: MediaTek MT7622 RFB1 board (DT)
[0.038580] pstate: 2085 (nzCv daIf -PAN -UAO)
[0.043538] pc : __clk_put+0xf0/0x120
[0.047328] lr : clk_put+0xc/0x18
[0.050753] sp : 09163ed0
[0.054178] x29: 09163ed0 x28: 41010018
[0.059676] x27:  x26: 
[0.065174] x25: 090cb008 x24: 80001dfd7800
[0.070672] x23: 08f934b8 x22: 80001dfdeae0
[0.076169] x21: 80001dfdeae0 x20: fdfb
[0.081668] x19: 0929ac40 x18: 
[0.087166] x17:  x16: 
[0.092664] x15: 091696c8 x14: 892cd70f
[0.098162] x13: 092cd71d x12: 09181348
[0.103661] x11: 09181000 x10: 05f5e0ff
[0.109159] x9 : 0040 x8 : 09181400
[0.114657] x7 : 80001c800270 x6 : 
[0.120155] x5 : 80001c800248 x4 : 
[0.125653] x3 :  x2 : 
[0.131151] x1 :  x0 : fdfb
[0.136649] Call trace:
[0.139175]  __clk_put+0xf0/0x120
[0.142605]  timer_of_cleanup+0x60/0x7c
[0.146572]  mtk_gpt_init+0x18c/0x1a0
[0.150359]  timer_probe+0x74/0x10c
[0.153969]  time_init+0x14/0x44
[0.157307]  start_kernel+0x29c/0x414
[0.161098] ---[ end trace c3137b005300b618 ]---

Fixes: a0858f937960 ("clocksource/drivers/timer-mediatek: Convert the driver to
timer-of")
Fixes: e3af677607d9 ("clocksource/drivers/timer-mediatek: Add support for system
timer")
Signed-off-by: Matthias Brugger 
---
 drivers/clocksource/timer-mediatek.c | 12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/clocksource/timer-mediatek.c
b/drivers/clocksource/timer-mediatek.c
index eb10321f8517..d40c77a65b08 100644
--- a/drivers/clocksource/timer-mediatek.c
+++ b/drivers/clocksource/timer-mediatek.c
@@ -276,8 +276,12 @@ static int __init mtk_syst_init(struct device_node *node)
to.of_irq.handler = mtk_syst_handler;

ret = timer_of_init(node, &to);
-   if (ret)
+   if (ret) {
+   if (ret == -EPROBE_DEFER)
+   return ret;
+
goto err;
+   }

clockevents_config_and_register(&to.clkevt, timer_of_rate(&to),
TIMER_SYNC_TICKS, 0x);
@@ -301,8 +305,12 @@ static int __init mtk_gpt_init(struct device_node *node)
to.of_irq.handler = mtk_gpt_interrupt;

ret = timer_of_init(node, &to);
-   if (ret)
+   if (ret) {
+   if (ret == -EPROBE_DEFER)
+   return ret;
+
goto err;
+   }

/* Configure clock source */
mtk_gpt_setup(&to, TIMER_CLK_SRC, GPT_CTRL_OP_FREERUN);
-- 
2.19.1


Re: [RFC PATCH v4 1/5] glibc: Perform rseq(2) registration at nptl init and thread creation

2018-11-23 Thread Mathieu Desnoyers
- On Nov 23, 2018, at 9:28 AM, Rich Felker dal...@libc.org wrote:
[...]
> 
> Absolutely. As long as it's in libc, implicit destruction will happen.
> Actually I think the glibc code shound unconditionally unregister the
> rseq address at exit (after blocking signals, so no application code
> can run) in case a third-party rseq library was linked and failed to
> do so before thread exit (e.g. due to mismatched ref counts) rather
> than respecting the reference count, since it knows it's the last
> user. This would make potentially-buggy code safer.

OK, let me go ahead with a few ideas/questions along that path.

Let's say our stated goal is to let the "exit" system call from the
glibc thread exit path perform rseq unregistration (without explicit
unregistration beforehand). Let's look at what we need.

First, we need the TLS area to be valid until the exit system call
is invoked by the thread. If glibc defines __rseq_abi as a weak symbol,
I'm not entirely sure we can guarantee the IE model if another library
gets its own global-dynamic weak symbol elected at execution time. Would
it be better to switch to a "strong" symbol for the glibc __rseq_abi
rather than weak ?

If we rely on implicit unregistration by the exit system call, then we
need to be really sure that the __rseq_abi TLS area can be accessed
(load and store) from kernel preemption up to the point where exit
is invoked. If we have that guarantee with the IE model, then we should
be fine. This means the memory area with the __rseq_abi sits can only
be re-used after the tid field in the TLB is set to 0 by the exit system
call. Looking at allocatestack.c, it looks like the FREE_P () macro
does exactly that.

With all the above respected, we could rely on implicit rseq unregistration
by thread exit rather than do an explicit unregister. We could still need
to increment the __rseq_refcount upon thread start however, so we can
ensure early adopter libraries won't unregister rseq while glibc is using
it. No need to bring the refcount back to 0 in glibc though.

There has been presumptions about signals being blocked when the thread
exits throughout this email thread. Out of curiosity, what code is
responsible for disabling signals in this situation ? Related to this,
is it valid to access a IE model TLS variable from a signal handler at
_any_ point where the signal handler nests over thread's execution ?
This includes early start and just before invoking the exit system call.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com


Re: Subject: [PATCH v4 0/4] mtd: rawnand: ams-delta: Use GPIO API for data I/O

2018-11-23 Thread Tony Lindgren
* Boris Brezillon  [181121 14:56]:
> On Wed, 21 Nov 2018 12:08:02 +0100
> Janusz Krzysztofik  wrote:
> 
> > Finalize implementation of the idea suggested by Artem Bityutskiy and
> > Tony Lindgren, described in commit b027274d2e3a ("mtd: ams-delta: fix
> > request_mem_region() failure"). Use pure GPIO consumer API, as reqested
> > by Boris Brezillon.
> > 
> > Janusz Krzysztofik (4):
> >   ARM: OMAP1: ams-delta: Provide GPIO lookup table for NAND data port
> >   mtd: rawnand: ams-delta: Request data port GPIO resource
> >   mtd: rawnand: ams-delta: Use GPIO API for data I/O
> >   ARM: OMAP1: ams-delta: Drop obsolete NAND resources
> > 
> >  arch/arm/mach-omap1/board-ams-delta.c |   22 ++
> >  drivers/mtd/nand/raw/ams-delta.c  |  120 
> > +++---
> >  2 files changed, 80 insertions(+), 62 deletions(-)
> > 
> > Performance on Amstrad Delta is now acceptable after recent extensions
> > to GPIO API and rawnanad enhancements.
> > 
> > Series intented to be merged via mtd tree, should not conflict with
> > other OMAP1 patches submitted for 4.21.
> 
> We'll prepare an immutable tag, just in case.

Sounds good to me thanks:

Acked-by: Tony Lindgren 


Re: [PATCH v4 1/4] ARM: OMAP1: ams-delta: Provide GPIO lookup table for NAND data port

2018-11-23 Thread Tony Lindgren
* Janusz Krzysztofik  [181121 11:06]:
> Data port used by Amstrad Delta NAND driver is actually an OMAP MPUIO
> device, already under control of gpio-omap driver.  The NAND driver
> gets access to the port by ioremapping it and performs read/write
> operations.  That is done without any proteciton from other users
> legally manipulating the port pins over GPIO API.
> 
> The plan is to convert the driver to access the port over GPIO consumer
> API.  Before that is implemented, the driver can already obtain
> exclusive access to the port by requesting an array of its GPIO
> descriptors.
> 
> Add respective entries to the NAND GPIO lookup table.
> 
> Signed-off-by: Janusz Krzysztofik 
> Reviewed-by: Boris Brezillon 
> Reviewed-by: Linus Walleij 

Best to merge this along with the MFD patches:

Acked-by: Tony Lindgren 


[PATCH] KVM: VMX: re-add ple_gap module parameter

2018-11-23 Thread Luiz Capitulino


Apparently, the ple_gap parameter was accidentally removed
by commit c8e88717cfc6b36bedea22368d97667446318291. Add it
back.

Signed-off-by: Luiz Capitulino 
---
 arch/x86/kvm/vmx.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 4555077d69ce..be6f13f1c25f 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -174,6 +174,7 @@ module_param_named(preemption_timer, 
enable_preemption_timer, bool, S_IRUGO);
  * refer SDM volume 3b section 21.6.13 & 22.1.3.
  */
 static unsigned int ple_gap = KVM_DEFAULT_PLE_GAP;
+module_param(ple_gap, uint, 0444);
 
 static unsigned int ple_window = KVM_VMX_DEFAULT_PLE_WINDOW;
 module_param(ple_window, uint, 0444);
-- 
2.17.2



Re: [PATCH v4 2/4] namei: O_BENEATH-style path resolution flags

2018-11-23 Thread Aleksa Sarai
On 2018-11-23, Jürg Billeter  wrote:
> On Tue, 2018-11-13 at 01:26 +1100, Aleksa Sarai wrote:
> > * O_BENEATH: Disallow "escapes" from the starting point of the
> >   filesystem tree during resolution (you must stay "beneath" the
> >   starting point at all times). Currently this is done by disallowing
> >   ".." and absolute paths (either in the given path or found during
> >   symlink resolution) entirely, as well as all "magic link" jumping.
> 
> With open_tree(2) and OPEN_TREE_CLONE, will O_BENEATH still be
> necessary? As I understand it, O_BENEATH could be replaced by a much
> simpler flag that only disallows absolute paths (incl. absolute
> symlinks). And it would have the benefit that you can actually pass the
> tree/directory fd to another process and escaping would not be possible
> even if that other process doesn't use O_BENEATH (after calling
> mount_setattr(2) to make sure it's locked down).
> 
> This approach would also make it easy to restrict writes via a cloned
> tree/directory fd by marking it read-only via mount_setattr(2) (and
> locking down the read-only flag). This would again be especially useful
> when passing tree/directory fds across processes, or for voluntary
> self-lockdown within a process for robustness against security bugs.
> 
> This wouldn't affect any of the other flags in this patch. And for full
> equivalence to O_BENEATH you'd have to use O_NOMAGICLINKS in addition
> to O_NOABSOLUTE, or whatever that new flag would be called.
> 
> Or is OPEN_TREE_CLONE too expensive for this use case? Or is there
> anything else I'm missing?

OPEN_TREE_CLONE currently requires CAP_SYS_ADMIN in mnt_ns->user_ns,
which requires a fork and unshare -- or at least a vfork and some other
magic -- at which point we're back to just doing a pivot_root(2) for
most operations.

I think open_tree(2) -- which I really should sit down and play around
with -- would be an interesting way of doing O_BENEATH, but I think
you'd still need to have the same race protections we have in the
current O_BENEATH proposal to handle "..".

So really you'd be using open_tree(OPEN_TREE_CLONE) just so that you can
use the "path.mnt" setting code, which I'm not sure is the best way of
doing it (plus the other interesting ideas which you get with the other
mount API changes).

But I am quite hopeful for what cool things we'll be able to make using
the new mount API.

-- 
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH



signature.asc
Description: PGP signature


[PATCH] xfs: clean up indentation issues, remove an unwanted space

2018-11-23 Thread Colin King
From: Colin Ian King 

There is a statement that has an unwanted space in the indentation.
Remove it.

Signed-off-by: Colin Ian King 
---
 fs/xfs/xfs_inode.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index 05db9540e459..f21756751540 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -2242,7 +2242,7 @@ xfs_ifree_cluster(
 * want it to fail. We can acheive this by adding a write
 * verifier to the buffer.
 */
-bp->b_ops = &xfs_inode_buf_ops;
+   bp->b_ops = &xfs_inode_buf_ops;
 
/*
 * Walk the inodes already attached to the buffer and mark them
-- 
2.19.1



Re: [PATCH 2/4] base/drivers/arch_topology: Replace mutex with READ_ONCE / WRITE_ONCE

2018-11-23 Thread Daniel Lezcano
On 23/11/2018 17:20, Sudeep Holla wrote:
> On Fri, Nov 23, 2018 at 05:04:18PM +0100, Daniel Lezcano wrote:
>> On 23/11/2018 14:58, Sudeep Holla wrote:
>>> On Mon, Oct 29, 2018 at 05:23:18PM +0100, Daniel Lezcano wrote:
 The mutex protects a per_cpu variable access. The potential race can
 happen only when the cpufreq governor module is loaded and at the same
 time the cpu capacity is changed in the sysfs.

>>>
>>> I wonder if we really need that sysfs entry to be writable. For some
>>> reason, I had assumed it's read only, obviously it's not. I prefer to
>>> make it RO if there's no strong reason other than debug purposes.
>>
>> Are you suggesting to remove the READ_ONCE/WRITE_ONCE patch and set the
>> sysfs file read-only ?
>>
> 
> Just to be sure, if we retain RW capability we still need to fix the
> race you are pointing out.
> 
> However I just don't see the need for RW cpu_capacity sysfs and hence
> asking the reason here. IIRC I had pointed this out long back(not sure
> internally or externally) but seemed to have missed the version that got
> merged. So I am just asking if we really need write capability given that
> it has known issues.
> 
> If user-space starts writing the value to influence the scheduler, then
> it makes it difficult for kernel to change the way it uses the
> cpu_capacity in future.
> 
> Sorry if there's valid usecase and I am just making noise here.

It's ok [added Juri Lelli]

I've been through the history:

commit be8f185d8af4dbd450023a42a48c6faa8cbcdfe6
Author: Juri Lelli 
Date:   Thu Nov 3 05:40:18 2016 +

arm64: add sysfs cpu_capacity attribute

Add a sysfs cpu_capacity attribute with which it is possible to read and
write (thus over-writing default values) CPUs capacity. This might be
useful in situations where values needs changing after boot.

The new attribute shows up as:

 /sys/devices/system/cpu/cpu*/cpu_capacity

Cc: Will Deacon 
Cc: Mark Brown 
Cc: Sudeep Holla 
Signed-off-by: Juri Lelli 
Signed-off-by: Catalin Marinas 

Juri do you have a use case where we want to override the capacity?

Shall we switch the sysfs attribute to read-only?


-- 
  Linaro.org │ Open source software for ARM SoCs

Follow Linaro:   Facebook |
 Twitter |
 Blog



[PATCH] irqchhip: Convert to using %pOFn instead of device_node.name

2018-11-23 Thread Yangtao Li
In preparation to remove the node name pointer from struct device_node,
convert printf users to use the %pOFn format specifier.

Signed-off-by: Yangtao Li 
---
 drivers/irqchip/irq-mscc-ocelot.c |  6 +++---
 drivers/irqchip/irq-stm32-exti.c  |  6 +++---
 drivers/irqchip/irq-tango.c   | 10 +-
 3 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/drivers/irqchip/irq-mscc-ocelot.c 
b/drivers/irqchip/irq-mscc-ocelot.c
index b63e40c00a02..88143c0b700c 100644
--- a/drivers/irqchip/irq-mscc-ocelot.c
+++ b/drivers/irqchip/irq-mscc-ocelot.c
@@ -72,7 +72,7 @@ static int __init ocelot_irq_init(struct device_node *node,
domain = irq_domain_add_linear(node, OCELOT_NR_IRQ,
   &irq_generic_chip_ops, NULL);
if (!domain) {
-   pr_err("%s: unable to add irq domain\n", node->name);
+   pr_err("%pOFn: unable to add irq domain\n", node);
return -ENOMEM;
}
 
@@ -80,14 +80,14 @@ static int __init ocelot_irq_init(struct device_node *node,
 "icpu", handle_level_irq,
 0, 0, 0);
if (ret) {
-   pr_err("%s: unable to alloc irq domain gc\n", node->name);
+   pr_err("%pOFn: unable to alloc irq domain gc\n", node);
goto err_domain_remove;
}
 
gc = irq_get_domain_generic_chip(domain, 0);
gc->reg_base = of_iomap(node, 0);
if (!gc->reg_base) {
-   pr_err("%s: unable to map resource\n", node->name);
+   pr_err("%pOFn: unable to map resource\n", node);
ret = -ENOMEM;
goto err_gc_free;
}
diff --git a/drivers/irqchip/irq-stm32-exti.c b/drivers/irqchip/irq-stm32-exti.c
index 0a2088e12d96..363385750fa7 100644
--- a/drivers/irqchip/irq-stm32-exti.c
+++ b/drivers/irqchip/irq-stm32-exti.c
@@ -678,8 +678,8 @@ static int __init stm32_exti_init(const struct 
stm32_exti_drv_data *drv_data,
domain = irq_domain_add_linear(node, drv_data->bank_nr * IRQS_PER_BANK,
   &irq_exti_domain_ops, NULL);
if (!domain) {
-   pr_err("%s: Could not register interrupt domain.\n",
-  node->name);
+   pr_err("%pOFn: Could not register interrupt domain.\n",
+  node);
ret = -ENOMEM;
goto out_unmap;
}
@@ -768,7 +768,7 @@ __init stm32_exti_hierarchy_init(const struct 
stm32_exti_drv_data *drv_data,
  host_data);
 
if (!domain) {
-   pr_err("%s: Could not register exti domain.\n", node->name);
+   pr_err("%pOFn: Could not register exti domain.\n", node);
ret = -ENOMEM;
goto out_unmap;
}
diff --git a/drivers/irqchip/irq-tango.c b/drivers/irqchip/irq-tango.c
index 580e2d72b9ba..ae28d8648679 100644
--- a/drivers/irqchip/irq-tango.c
+++ b/drivers/irqchip/irq-tango.c
@@ -184,11 +184,11 @@ static int __init tangox_irq_init(void __iomem *base, 
struct resource *baseres,
 
irq = irq_of_parse_and_map(node, 0);
if (!irq)
-   panic("%s: failed to get IRQ", node->name);
+   panic("%pOFn: failed to get IRQ", node);
 
err = of_address_to_resource(node, 0, &res);
if (err)
-   panic("%s: failed to get address", node->name);
+   panic("%pOFn: failed to get address", node);
 
chip = kzalloc(sizeof(*chip), GFP_KERNEL);
chip->ctl = res.start - baseres->start;
@@ -196,12 +196,12 @@ static int __init tangox_irq_init(void __iomem *base, 
struct resource *baseres,
 
dom = irq_domain_add_linear(node, 64, &irq_generic_chip_ops, chip);
if (!dom)
-   panic("%s: failed to create irqdomain", node->name);
+   panic("%pOFn: failed to create irqdomain", node);
 
err = irq_alloc_domain_generic_chips(dom, 32, 2, node->name,
 handle_level_irq, 0, 0, 0);
if (err)
-   panic("%s: failed to allocate irqchip", node->name);
+   panic("%pOFn: failed to allocate irqchip", node);
 
tangox_irq_domain_init(dom);
 
@@ -219,7 +219,7 @@ static int __init tangox_of_irq_init(struct device_node 
*node,
 
base = of_iomap(node, 0);
if (!base)
-   panic("%s: of_iomap failed", node->name);
+   panic("%pOFn: of_iomap failed", node);
 
of_address_to_resource(node, 0, &res);
 
-- 
2.17.0



[PATCH] ufs: clean up indentation issues, replace spaces with tabs

2018-11-23 Thread Colin King
From: Colin Ian King 

There is a hunk of code where spaces are used for identations, and it
off by one in an editor. Clean this up by replacing them with tabs.
Also remove one blank line.

Signed-off-by: Colin Ian King 
---
 fs/ufs/inode.c | 87 +-
 1 file changed, 43 insertions(+), 44 deletions(-)

diff --git a/fs/ufs/inode.c b/fs/ufs/inode.c
index c843ec858cf7..c4fb99eddff4 100644
--- a/fs/ufs/inode.c
+++ b/fs/ufs/inode.c
@@ -1066,52 +1066,51 @@ static int ufs_alloc_lastblock(struct inode *inode, 
loff_t size)
 
lastpage = ufs_get_locked_page(mapping, lastfrag >>
   (PAGE_SHIFT - inode->i_blkbits));
-   if (IS_ERR(lastpage)) {
-   err = -EIO;
-   goto out;
-   }
-
-   end = lastfrag & ((1 << (PAGE_SHIFT - inode->i_blkbits)) - 1);
-   bh = page_buffers(lastpage);
-   for (i = 0; i < end; ++i)
-   bh = bh->b_this_page;
-
-
-   err = ufs_getfrag_block(inode, lastfrag, bh, 1);
-
-   if (unlikely(err))
-  goto out_unlock;
-
-   if (buffer_new(bh)) {
-  clear_buffer_new(bh);
-  clean_bdev_bh_alias(bh);
-  /*
-   * we do not zeroize fragment, because of
-   * if it maped to hole, it already contains zeroes
-   */
-  set_buffer_uptodate(bh);
-  mark_buffer_dirty(bh);
-  set_page_dirty(lastpage);
-   }
-
-   if (lastfrag >= UFS_IND_FRAGMENT) {
-  end = uspi->s_fpb - ufs_fragnum(lastfrag) - 1;
-  phys64 = bh->b_blocknr + 1;
-  for (i = 0; i < end; ++i) {
-  bh = sb_getblk(sb, i + phys64);
-  lock_buffer(bh);
-  memset(bh->b_data, 0, sb->s_blocksize);
-  set_buffer_uptodate(bh);
-  mark_buffer_dirty(bh);
-  unlock_buffer(bh);
-  sync_dirty_buffer(bh);
-  brelse(bh);
-  }
-   }
+   if (IS_ERR(lastpage)) {
+   err = -EIO;
+   goto out;
+   }
+
+   end = lastfrag & ((1 << (PAGE_SHIFT - inode->i_blkbits)) - 1);
+   bh = page_buffers(lastpage);
+   for (i = 0; i < end; ++i)
+   bh = bh->b_this_page;
+
+   err = ufs_getfrag_block(inode, lastfrag, bh, 1);
+
+   if (unlikely(err))
+   goto out_unlock;
+
+   if (buffer_new(bh)) {
+   clear_buffer_new(bh);
+   clean_bdev_bh_alias(bh);
+   /*
+* we do not zeroize fragment, because of
+* if it maped to hole, it already contains zeroes
+*/
+   set_buffer_uptodate(bh);
+   mark_buffer_dirty(bh);
+   set_page_dirty(lastpage);
+   }
+
+   if (lastfrag >= UFS_IND_FRAGMENT) {
+   end = uspi->s_fpb - ufs_fragnum(lastfrag) - 1;
+   phys64 = bh->b_blocknr + 1;
+   for (i = 0; i < end; ++i) {
+   bh = sb_getblk(sb, i + phys64);
+   lock_buffer(bh);
+   memset(bh->b_data, 0, sb->s_blocksize);
+   set_buffer_uptodate(bh);
+   mark_buffer_dirty(bh);
+   unlock_buffer(bh);
+   sync_dirty_buffer(bh);
+   brelse(bh);
+   }
+   }
 out_unlock:
-   ufs_put_locked_page(lastpage);
+   ufs_put_locked_page(lastpage);
 out:
-   return err;
+   return err;
 }
 
 static void ufs_truncate_blocks(struct inode *inode)
-- 
2.19.1



Re: [PATCH v4 2/4] namei: O_BENEATH-style path resolution flags

2018-11-23 Thread Aleksa Sarai
On 2018-11-24, Aleksa Sarai  wrote:
> > >> On Tue, 2018-11-13 at 01:26 +1100, Aleksa Sarai wrote:
> > >> * O_BENEATH: Disallow "escapes" from the starting point of the
> > >>  filesystem tree during resolution (you must stay "beneath" the
> > >>  starting point at all times). Currently this is done by disallowing
> > >>  ".." and absolute paths (either in the given path or found during
> > >>  symlink resolution) entirely, as well as all "magic link" jumping.
> > >
> > > With open_tree(2) and OPEN_TREE_CLONE, will O_BENEATH still be
> > > necessary?
> > 
> > This discussion reminds me of something I’m uncomfortable with in the
> > current patches: currently, most of the O_ flags determine some
> > property of the returned opened file.  The new O_ flags you're adding
> > don't -- instead, they affect the lookup of the file.  So O_BENEATH
> > doesn't return a descriptor that can only be used to loop up files
> > beneath it -- it just controls whether open(2) succeeds or fails.  It
> > might be nice for the naming of the flags to reflect this.
> 
> I agree that there is something quite weird about having path resolution
> flags in an *open* syscall. The main reason why it's linked to open is
> because that's the only way to get O_PATH descriptors (which is what you
> would use for most of the extended operations we need -- as well as
> reading+writing to files which is what most users would do with this).
> 
> And I think O_PATH is another example of an open flag that is just odd
> in how it changes the semantics of using open(2).
> 
> One of the ideas I pitched in an earlier mail was a hypothetical
> resolveat(2) -- which would just be a new way of getting an O_PATH
> descriptor. This way, we wouldn't be using up more O_* flag bits with
> this feature and we'd be able to play with more radical semantic changes
> in the future. I can outline these here if you like, but it's a bit of a
> long discussion and I'd prefer not to derail things too much if
> resolveat(2) is definitely out of the question.

Sorry, one thing I forgot to mention about returning descriptors that
can only look up files beneath it -- while I think this would be very
useful, I'd be worried about jumping into chroot(2) territory where now
you are giving userspace the opportunity to try to create nested
"beneathfds" and so on. I do think it would be quite useful and
interesting though, but I'm not entirely sure how doable it would be
with the current namei infrastructure.

-- 
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH



signature.asc
Description: PGP signature


Re: Task group cleanups and optimizations (was: Re: [RFC 00/60] Coscheduling for Linux)

2018-11-23 Thread Frederic Weisbecker
On Tue, Sep 18, 2018 at 03:22:13PM +0200, Jan H. Schönherr wrote:
> On 09/17/2018 11:48 AM, Peter Zijlstra wrote:
> > Right, so the whole bandwidth thing becomes a pain; the simplest
> > solution is to detect the throttle at task-pick time, dequeue and try
> > again. But that is indeed quite horrible.
> > 
> > I'm not quite sure how this will play out.
> > 
> > Anyway, if we pull off this flattening feat, then you can no longer use
> > the hierarchy for this co-scheduling stuff.
> 
> Yeah. I might be a bit biased towards keeping or at least not fully throwing 
> away
> the nesting of CFS runqueues. ;)

One detail here, is that hierarchical task group a strong requirement for 
cosched
or could you live with it flattened in the end?


Re: omap5 fixing palmas IRQ_TYPE_NONE warning leads to gpadc timeouts

2018-11-23 Thread Tony Lindgren
* Jon Hunter  [181120 11:14]:
> On 19/11/2018 17:14, Tony Lindgren wrote:
> > Well so commit 7e9d474954f4 ("ARM: tegra: Correct polarity for
> > Tegra114 PMIC interrupt") states that tegra114 inverts the
> > polarity of the PMIC interrupt. So adding Jon and Thierry to Cc.
> 
> Yes Tegra can invert the polarity of the PMIC interrupt.

So is there some IP on Tegra called "Tegra PMC" that is
inverting the interrupt? Or is the "Tegra PMC" that commit
7e9d474954f4 mentions just the palmas configuration for
inverting the interrupt?

The problem I'm having is With omap5 where I can only get the
PMIC interrupts working with IRQ_TYPE_LEVEL_HIGH if
PALMAS_POLARITY_CTRL_INT_POLARITY is not set unlike for
Tegra.

Regards,

Tony


Re: [PATCH v4 2/4] namei: O_BENEATH-style path resolution flags

2018-11-23 Thread Aleksa Sarai
On 2018-11-23, Andy Lutomirski  wrote:
> > On Nov 23, 2018, at 5:10 AM, Jürg Billeter  wrote:
> >
> > Hi Aleksa,
> >
> >> On Tue, 2018-11-13 at 01:26 +1100, Aleksa Sarai wrote:
> >> * O_BENEATH: Disallow "escapes" from the starting point of the
> >>  filesystem tree during resolution (you must stay "beneath" the
> >>  starting point at all times). Currently this is done by disallowing
> >>  ".." and absolute paths (either in the given path or found during
> >>  symlink resolution) entirely, as well as all "magic link" jumping.
> >
> > With open_tree(2) and OPEN_TREE_CLONE, will O_BENEATH still be
> > necessary?
> 
> This discussion reminds me of something I’m uncomfortable with in the
> current patches: currently, most of the O_ flags determine some
> property of the returned opened file.  The new O_ flags you're adding
> don't -- instead, they affect the lookup of the file.  So O_BENEATH
> doesn't return a descriptor that can only be used to loop up files
> beneath it -- it just controls whether open(2) succeeds or fails.  It
> might be nice for the naming of the flags to reflect this.

I agree that there is something quite weird about having path resolution
flags in an *open* syscall. The main reason why it's linked to open is
because that's the only way to get O_PATH descriptors (which is what you
would use for most of the extended operations we need -- as well as
reading+writing to files which is what most users would do with this).

And I think O_PATH is another example of an open flag that is just odd
in how it changes the semantics of using open(2).

One of the ideas I pitched in an earlier mail was a hypothetical
resolveat(2) -- which would just be a new way of getting an O_PATH
descriptor. This way, we wouldn't be using up more O_* flag bits with
this feature and we'd be able to play with more radical semantic changes
in the future. I can outline these here if you like, but it's a bit of a
long discussion and I'd prefer not to derail things too much if
resolveat(2) is definitely out of the question.

> I also don't love that we have some magic AT_ flags that work with
> some syscalls and some magic O_ flags that work with others.

I also completely agree. I think that we should have a discussion about
the long-term plan of syscall flags because it's starting to get a
little bit crazy:

 * Every "get an fd" syscall has its own brand of O_CLOEXEC. Thankfully,
   many of them use the same value (except for memfd_create(2) and a few
   other examples).
 * AT_* was supposed to be generic across all *at(2) syscalls, but there
   are several cases where this isn't really true anymore.

   * renameat2(2) only supports RENAME_* flags.
   * openat(2) supports only O_* flags.
   * Most AT_* flags have O_* counterparts (or are even more of a mess
 such as with {AT_SYMLINK_FOLLOW,AT_SYMLINK_NOFOLLOW,O_NOFOLLOW}).
   * statx(2) added AT_STATX_* flags, making AT_* no longer generic.

(Also I just want to mention something I noticed while writing this
patch -- openat(2) violates one of the kernel "golden rules" -- that you
reject unknown flags. openat(2) will silently ignore unknown flag bits.
I'm sure there's a really good reason for this, but it's another flag
oddity that I felt fit here.)

-- 
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH



signature.asc
Description: PGP signature


[PATCH v2] module: make it clearer when we're handling kallsyms symbols vs exported symbols

2018-11-23 Thread Jessica Yu
The module loader internally works with both exported symbols
represented as struct kernel_symbol, as well as Elf symbols from a
module's symbol table. It's hard to distinguish sometimes which type of
symbol we're handling given that some helper function names are not
consistent or helpful. Take get_ksymbol() for instance - are we
looking for an exported symbol or a kallsyms symbol here? Or symname()
and kernel_symbol_name() - which function handles an exported symbol and
which one an Elf symbol?

Clean up and unify the function naming scheme a bit to make it clear
which kind of symbol we're handling. This change only affects static
functions internal to the module loader.

Signed-off-by: Jessica Yu 
---

v2: renamed kallsyms_find_* funcs to find_kallsyms_* to follow the
already existing _ naming convention in module.c

 kernel/module.c | 78 -
 1 file changed, 44 insertions(+), 34 deletions(-)

diff --git a/kernel/module.c b/kernel/module.c
index 49a405891587..1b5edf78694c 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -495,9 +495,9 @@ struct find_symbol_arg {
const struct kernel_symbol *sym;
 };
 
-static bool check_symbol(const struct symsearch *syms,
-struct module *owner,
-unsigned int symnum, void *data)
+static bool check_exported_symbol(const struct symsearch *syms,
+ struct module *owner,
+ unsigned int symnum, void *data)
 {
struct find_symbol_arg *fsa = data;
 
@@ -555,9 +555,9 @@ static int cmp_name(const void *va, const void *vb)
return strcmp(a, kernel_symbol_name(b));
 }
 
-static bool find_symbol_in_section(const struct symsearch *syms,
-  struct module *owner,
-  void *data)
+static bool find_exported_symbol_in_section(const struct symsearch *syms,
+   struct module *owner,
+   void *data)
 {
struct find_symbol_arg *fsa = data;
struct kernel_symbol *sym;
@@ -565,13 +565,14 @@ static bool find_symbol_in_section(const struct symsearch 
*syms,
sym = bsearch(fsa->name, syms->start, syms->stop - syms->start,
sizeof(struct kernel_symbol), cmp_name);
 
-   if (sym != NULL && check_symbol(syms, owner, sym - syms->start, data))
+   if (sym != NULL && check_exported_symbol(syms, owner,
+sym - syms->start, data))
return true;
 
return false;
 }
 
-/* Find a symbol and return it, along with, (optional) crc and
+/* Find an exported symbol and return it, along with, (optional) crc and
  * (optional) module which owns it.  Needs preempt disabled or module_mutex. */
 const struct kernel_symbol *find_symbol(const char *name,
struct module **owner,
@@ -585,7 +586,7 @@ const struct kernel_symbol *find_symbol(const char *name,
fsa.gplok = gplok;
fsa.warn = warn;
 
-   if (each_symbol_section(find_symbol_in_section, &fsa)) {
+   if (each_symbol_section(find_exported_symbol_in_section, &fsa)) {
if (owner)
*owner = fsa.owner;
if (crc)
@@ -2198,7 +2199,7 @@ EXPORT_SYMBOL_GPL(__symbol_get);
  *
  * You must hold the module_mutex.
  */
-static int verify_export_symbols(struct module *mod)
+static int verify_exported_symbols(struct module *mod)
 {
unsigned int i;
struct module *owner;
@@ -2519,10 +2520,10 @@ static void free_modinfo(struct module *mod)
 
 #ifdef CONFIG_KALLSYMS
 
-/* lookup symbol in given range of kernel_symbols */
-static const struct kernel_symbol *lookup_symbol(const char *name,
-   const struct kernel_symbol *start,
-   const struct kernel_symbol *stop)
+/* Lookup exported symbol in given range of kernel_symbols */
+static const struct kernel_symbol *lookup_exported_symbol(const char *name,
+ const struct 
kernel_symbol *start,
+ const struct 
kernel_symbol *stop)
 {
return bsearch(name, start, stop - start,
sizeof(struct kernel_symbol), cmp_name);
@@ -2533,9 +2534,10 @@ static int is_exported(const char *name, unsigned long 
value,
 {
const struct kernel_symbol *ks;
if (!mod)
-   ks = lookup_symbol(name, __start___ksymtab, __stop___ksymtab);
+   ks = lookup_exported_symbol(name, __start___ksymtab, 
__stop___ksymtab);
else
-   ks = lookup_symbol(name, mod->syms, mod->syms + mod->num_syms);
+   ks = lookup_exported_symbol(name, mod->syms, mod->syms + 
mod->num_syms);
+
return ks != NULL && kernel_symbol_value(ks) == value;
 }
 
@@ -3592,7 +3594,7 @@ stat

[PATCH] jbd2: clean up indentation issue, replace spaces with tab

2018-11-23 Thread Colin King
From: Colin Ian King 

There is a statement that is indented with spaces, replace it with
a tab.

Signed-off-by: Colin Ian King 
---
 fs/jbd2/transaction.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
index c0b66a7a795b..4af8ff17ad5c 100644
--- a/fs/jbd2/transaction.c
+++ b/fs/jbd2/transaction.c
@@ -910,7 +910,7 @@ do_get_write_access(handle_t *handle, struct journal_head 
*jh,
 * this is the first time this transaction is touching this buffer,
 * reset the modified flag
 */
-   jh->b_modified = 0;
+   jh->b_modified = 0;
 
/*
 * If the buffer is not journaled right now, we need to make sure it
-- 
2.19.1



Re: [PATCH] x86: only use ERMS for user copies for larger sizes

2018-11-23 Thread Josh Poimboeuf
On Thu, Nov 22, 2018 at 12:13:41PM +0100, Ingo Molnar wrote:
> Note to self: watch out for patches that change altinstructions and don't 
> make premature vmlinux size impact assumptions. :-)

I noticed a similar problem with ORC data.  As it turns out, size's
"text" calculation also includes read-only sections.  That includes
.rodata and anything else not writable.

Maybe we need a more sensible "size" script for the kernel.  It would be
trivial to implement based on the output of "readelf -S vmlinux".

-- 
Josh


Re: [PATCH 14/17] ARM: OMAP2+: use pdata quirks for PRUSS reset lines on AM335x

2018-11-23 Thread Tony Lindgren
* Roger Quadros  [181122 11:40]:
> From: Suman Anna 
> 
> The omap_device API is needed to perform the reset management for
> any IP instances with PRCM RSTCTRL registers (hard reset lines).
> This API is limited to the mach-omap2 layer, and cannot be exposed
> to drivers layer directly. So use platform data ops and pdata quirks
> for the PRUSS IP in AM335x SoCs to plumb the required omap_device
> API. The PRUSS SoC bus driver can then use these pdata ops to
> achieve the required reset functionality.
> 
> This is being implemented this way as there is no separate reset
> driver at the moment.

If the reset-simple approach has issues we can certainly do
this until the issues are fixed. I believe Suman had an issue
where we need to prevent the related clockdomain from idling.

Regards,

Tony


Re: [PATCH 08/17] soc: ti: pruss: Add a PRUSS irqchip driver for PRUSS interrupts

2018-11-23 Thread Tony Lindgren
* Roger Quadros  [181122 11:39]:
> The PRUSS INTC module is reference counted during the interrupt
> setup phase through the irqchip's irq_request_resources() and
> irq_release_resources() ops. This restricts the module from being
> removed as long as there are active interrupt users.

So are there any reasons why this could not be just a regular
drivers/irqchip driver nowadays?

Regards,

Tony


Re: [PATCH] x86: only use ERMS for user copies for larger sizes

2018-11-23 Thread Linus Torvalds
On Fri, Nov 23, 2018 at 2:12 AM David Laight  wrote:
>
> I've just patched my driver and redone the test on a 4.13 (ubuntu) kernel.
> Calling memcpy_fromio(kernel_buffer, PCIe_address, length)
> generates a lot of single byte TLP.

I just tested it too - it turns out that the __inline_memcpy() code
never triggers, and "memcpy_toio()" just generates a memcpy.

So that code seems entirely dead.

And, in fact, the codebase I looked at was the historical one, because
I had been going back and looking at the history. The modern tree
*does* have the "__inline_memcpy()" function I pointed at, but it's
not actually hooked up to anything!

This actually has been broken for _ages_. The breakage goes back to
2010, and commit 6175ddf06b61 ("x86: Clean up mem*io functions"), and
it seems nobody really ever noticed - or thought that it was ok.

That commit claims that iomem has no special significance on x86, but
that really really isn't true, exactly because the access size does
matter.

And as mentioned, the generic memory copy routines are not at all
appropriate, and that has nothing to do with ERMS. Our "do it by hand"
memory copy routine does things like this:

.Lless_16bytes:
cmpl $8,%edx
jb   .Lless_8bytes
/*
 * Move data from 8 bytes to 15 bytes.
 */
movq 0*8(%rsi), %r8
movq -1*8(%rsi, %rdx),  %r9
movq %r8,   0*8(%rdi)
movq %r9,   -1*8(%rdi, %rdx)
retq

and note how for a 8-byte copy, it will do *two* reads of the same 8
bytes, and *two* writes of the same 8 byte destination. That's
perfectly ok for regular memory, and it means that the code can handle
an arbitrary 8-15 byte copy without any conditionals or loop counts,
but it is *not* ok for iomem.

Of course, in practice it all just happens to work in almost all
situations (because a lot of iomem devices simply won't care), and
manual access to iomem is basically extremely rare these days anyway,
but it's definitely entirely and utterly broken.

End result: we *used* to do this right. For the last eight years our
"memcpy_{to,from}io()" has been entirely broken, and apparently even
the people who noticed oddities like David, never reported it as
breakage but instead just worked around it in drivers.

Ho humm.

Let me write a generic routine in lib/iomap_copy.c (which already does
the "user specifies chunk size" cases), and hook it up for x86.

David, are you using a bus analyzer or something to do your testing?
I'll have a trial patch for you asap.

   Linus


Re: [PATCH 07/17] soc: ti: pruss: enable OCP master ports in SYSCFG always

2018-11-23 Thread Tony Lindgren
* Roger Quadros  [181122 11:39]:
> @@ -160,6 +159,11 @@ static int pruss_enable_module(struct device *dev)
>   pruss_soc_bus_rmw(psoc_bus->syscfg, SYSCFG_STANDBY_MODE_MASK,
> SYSCFG_STANDBY_MODE_SMART);
>  
> + /* enable OCP master ports/disable MStandby */
> + ret = pruss_soc_bus_enable_ocp_master_ports(dev);
> + if (ret)
> + pruss_disable_module(dev);
> +
>   return ret;

Seems like all you should need to do here with ti-sysc is to
leave out SYSC_IDLE_SMART and SYSC_IDLE_SMART_WKUP in the dts
for ti,sysc-midle property for am335x.

Regards,

Tony


Re: [PATCH 09/17] irqchip: bcm283x: Switch to SPDX identifier

2018-11-23 Thread Stefan Wahren
Hi Marc,

> Marc Zyngier  hat am 23. November 2018 um 16:34 
> geschrieben:
> 
> 
> On 23/11/2018 15:23, Stefan Wahren wrote:
> > Hi Jason,
> > hi Marc,
> > 
> > Am 10.11.18 um 16:54 schrieb Stefan Wahren:
> >> Adopt the SPDX license identifier headers to ease license compliance
> >> management.
> >>
> >> Cc: Simon Arlott 
> >> Cc: Eric Anholt 
> >> Signed-off-by: Stefan Wahren 
> >> ---
> >>  drivers/irqchip/irq-bcm2835.c | 11 +--
> >>  drivers/irqchip/irq-bcm2836.c | 11 +--
> >>  2 files changed, 2 insertions(+), 20 deletions(-)
> >>
> >> diff --git a/drivers/irqchip/irq-bcm2835.c b/drivers/irqchip/irq-bcm2835.c
> >> index d2da8a1..418245d 100644
> >> --- a/drivers/irqchip/irq-bcm2835.c
> >> +++ b/drivers/irqchip/irq-bcm2835.c
> >> @@ -1,17 +1,8 @@
> >> +// SPDX-License-Identifier: GPL-2.0+
> >>  /*
> >>   * Copyright 2010 Broadcom
> >>   * Copyright 2012 Simon Arlott, Chris Boot, Stephen Warren
> >>   *
> >> - * This program is free software; you can redistribute it and/or modify
> >> - * it under the terms of the GNU General Public License as published by
> >> - * the Free Software Foundation; either version 2 of the License, or
> >> - * (at your option) any later version.
> >> - *
> >> - * This program is distributed in the hope that it will be useful,
> >> - * but WITHOUT ANY WARRANTY; without even the implied warranty of
> >> - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> >> - * GNU General Public License for more details.
> >> - *
> >>   * Quirk 1: Shortcut interrupts don't set the bank 1/2 register pending 
> >> bits
> >>   *
> >>   * If an interrupt fires on bank 1 that isn't in the shortcuts list, bit 8
> >> diff --git a/drivers/irqchip/irq-bcm2836.c b/drivers/irqchip/irq-bcm2836.c
> >> index dfe4a46..2038693 100644
> >> --- a/drivers/irqchip/irq-bcm2836.c
> >> +++ b/drivers/irqchip/irq-bcm2836.c
> >> @@ -1,17 +1,8 @@
> >> +// SPDX-License-Identifier: GPL-2.0+
> >>  /*
> >>   * Root interrupt controller for the BCM2836 (Raspberry Pi 2).
> >>   *
> >>   * Copyright 2015 Broadcom
> >> - *
> >> - * This program is free software; you can redistribute it and/or modify
> >> - * it under the terms of the GNU General Public License as published by
> >> - * the Free Software Foundation; either version 2 of the License, or
> >> - * (at your option) any later version.
> >> - *
> >> - * This program is distributed in the hope that it will be useful,
> >> - * but WITHOUT ANY WARRANTY; without even the implied warranty of
> >> - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> >> - * GNU General Public License for more details.
> >>   */
> >>  
> >>  #include 
> > 
> > Eric already made a review. Are you fine with it?
> 
> Sure. How do you want this to go in?

i would prefer if this patch go via irqchip tree.

Thanks Stefan

> If you have a series repainting
> multiple subsystems, I'd rather it stays as such, and you can stick my
> 
> Acked-by: Marc Zyngier 
> 
> on it.
> 
> Thanks,
> 
>   M.
> -- 
> Jazz is not dead. It just smells funny...


[PATCH] ext4: clean up indentation issues, remove extraneous tabs

2018-11-23 Thread Colin King
From: Colin Ian King 

There are several lines that are indented too far, clean these
up by removing the tabs.

Signed-off-by: Colin Ian King 
---
 fs/ext4/migrate.c | 12 ++--
 fs/ext4/super.c   |  2 +-
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/fs/ext4/migrate.c b/fs/ext4/migrate.c
index 61a9d1927817..5ae61cd8c5c9 100644
--- a/fs/ext4/migrate.c
+++ b/fs/ext4/migrate.c
@@ -535,22 +535,22 @@ int ext4_ext_migrate(struct inode *inode)
if (i_data[EXT4_IND_BLOCK]) {
retval = update_ind_extent_range(handle, tmp_inode,
le32_to_cpu(i_data[EXT4_IND_BLOCK]), &lb);
-   if (retval)
-   goto err_out;
+   if (retval)
+   goto err_out;
} else
lb.curr_block += max_entries;
if (i_data[EXT4_DIND_BLOCK]) {
retval = update_dind_extent_range(handle, tmp_inode,
le32_to_cpu(i_data[EXT4_DIND_BLOCK]), &lb);
-   if (retval)
-   goto err_out;
+   if (retval)
+   goto err_out;
} else
lb.curr_block += max_entries * max_entries;
if (i_data[EXT4_TIND_BLOCK]) {
retval = update_tind_extent_range(handle, tmp_inode,
le32_to_cpu(i_data[EXT4_TIND_BLOCK]), &lb);
-   if (retval)
-   goto err_out;
+   if (retval)
+   goto err_out;
}
/*
 * Build the last extent
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 5603a4a1a864..15fc71398747 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -1940,7 +1940,7 @@ static int handle_mount_opt(struct super_block *sb, char 
*opt, int token,
 #ifdef CONFIG_FS_DAX
ext4_msg(sb, KERN_WARNING,
"DAX enabled. Warning: EXPERIMENTAL, use at your own risk");
-   sbi->s_mount_opt |= m->mount_opt;
+   sbi->s_mount_opt |= m->mount_opt;
 #else
ext4_msg(sb, KERN_INFO, "dax option not supported");
return -1;
-- 
2.19.1



Re: [RFC 00/60] Coscheduling for Linux

2018-11-23 Thread Frederic Weisbecker
On Thu, Sep 27, 2018 at 11:36:34AM -0700, Subhra Mazumdar wrote:
> 
> 
> On 09/26/2018 02:58 AM, Jan H. Schönherr wrote:
> >On 09/17/2018 02:25 PM, Peter Zijlstra wrote:
> >>On Fri, Sep 14, 2018 at 06:25:44PM +0200, Jan H. Schönherr wrote:
> >>
> >>>Assuming, there is a cgroup-less solution that can prevent simultaneous
> >>>execution of tasks on a core, when they're not supposed to. How would you
> >>>tell the scheduler, which tasks these are?
> >>Specifically for L1TF I hooked into/extended KVM's preempt_notifier
> >>registration interface, which tells us which tasks are VCPUs and to
> >>which VM they belong.
> >>
> >>But if we want to actually expose this to userspace, we can either do a
> >>prctl() or extend struct sched_attr.
> >Both, Peter and Subhra, seem to prefer an interface different than cgroups
> >to specify what to coschedule.
> >
> >Can you provide some extra motivation for me, why you feel that way?
> >(ignoring the current scalability issues with the cpu group controller)
> >
> >
> >After all, cgroups where designed to create arbitrary groups of tasks and
> >to attach functionality to those groups.
> >
> >If we were to introduce a different interface to control that, we'd need to
> >introduce a whole new group concept, so that you make tasks part of some
> >group while at the same time preventing unauthorized tasks from joining a
> >group.
> >
> >
> >I currently don't see any wins, just a loss in flexibility.
> >
> >Regards
> >Jan
> I think cgroups will the get the job done for any use case. But we have,
> e.g. affinity control via both sched_setaffinity and cgroup cpusets. It
> will be good to have an alternative way to specify co-scheduling too for
> those who don't want to use cgroup for some reason. It can be added later
> on though, only how one will override the other will need to be sorted out.

I kind of agree with Jan here that this is just going to add yet another task
group mechanism, very similar to the existing one, with runqueues inside and 
all.

Can you imagine kernel/sched/fair.c now dealing with both groups 
implementations?
What happens when cgroup task groups and cosched sched groups don't match wrt.
their tasks, their priorities, etc...

I understand cgroup task group has become infamous. But it may be a better idea
in the long run to fix it.


Re: [PATCH 05/17] soc: ti: pruss: Configure SYSCFG properly during probe/remove

2018-11-23 Thread Tony Lindgren
* Roger Quadros  [181122 11:39]:
> +/* firmware must be idle when calling this function */
> +static void pruss_disable_module(struct device *dev)
> +{
> + struct pruss_soc_bus *psoc_bus = dev_get_drvdata(dev);
> +
> + /* configure Smart Standby */
> + pruss_soc_bus_rmw(psoc_bus->syscfg, SYSCFG_STANDBY_MODE_MASK,
> +   SYSCFG_STANDBY_MODE_SMART);
> +
> + /* initiate MStandby */
> + pruss_soc_bus_rmw(psoc_bus->syscfg, SYSCFG_STANDBY_INIT,
> +   SYSCFG_STANDBY_INIT);
> +
> + /* tell PRCM to initiate IDLE request */
> + pm_runtime_put_sync(dev);
> +}
> +
> +static int pruss_enable_module(struct device *dev)
> +{
> + struct pruss_soc_bus *psoc_bus = dev_get_drvdata(dev);
> + int ret;
> +
> + /* tell PRCM to de-assert IDLE request */
> + ret = pm_runtime_get_sync(dev);
> + if (ret < 0) {
> + pm_runtime_put_noidle(dev);
> + return ret;
> + }
> +
> + /* configure for Smart Idle & Smart Standby */
> + pruss_soc_bus_rmw(psoc_bus->syscfg, SYSCFG_IDLE_MODE_MASK,
> +   SYSCFG_IDLE_MODE_SMART);
> + pruss_soc_bus_rmw(psoc_bus->syscfg, SYSCFG_STANDBY_MODE_MASK,
> +   SYSCFG_STANDBY_MODE_SMART);
> +
> + return ret;
> +}

Yeah so nothing PRU specific here, this you should be able to
handle in a generic way with drivers/bus/ti-sysc.c.

Regards,

Tony


Re: [PATCH 01/17] dt-bindings: remoteproc: Add TI PRUSS bindings

2018-11-23 Thread Tony Lindgren
Hi,

* Roger Quadros  [181122 11:39]:
> From: Suman Anna 
> +Example:
> +
> +1.   /* AM33xx PRU-ICSS */
> + pruss_soc_bus: pruss_soc_bus@4a326004 {
> + compatible = "ti,am3356-pruss-soc-bus";
> + ti,hwmods = "pruss";
> + reg = <0x4a326004 0x4>;
> + #address-cells = <1>;
> + #size-cells = <1>;
> + ranges;

The top level interconnect target module driver should be just ti-sysc
as documented in Documentation/devicetree/bindings/bus/ti-sysc.txt.
AFAIK there is nothing PRU specific there. So let's not add yet
another custom interconnect target module handling code to deal with.

I also posted a patch a while back for using reset-simple with
ti-sysc as "[PATCHv2] reset: ti-rstctrl: use the reset-simple driver".

> + pruss_cfg: cfg@4a326000 {
> + compatible = "syscon";
> + reg = <0x4a326000 0x2000>;
> + };
> +
> + pruss_iep: iep@4a32e000 {
> + compatible = "syscon";
> + reg = <0x4a32e000 0x31c>;
> + };
> +
> + pruss_mii_rt: mii_rt@4a332000 {
> + compatible = "syscon";
> + reg = <0x4a332000 0x58>;
> + };

Hmm what are these syscon register actually doing? Sseems like
they should be just handled by a phy driver nowadays?

Other than that the binding looks OK to me. Good to finally
see some activity in getting the PRU support merged :)

Regards,

Tony


[PATCH] xen/x86: add diagnostic printout to xen_mc_flush() in case of error

2018-11-23 Thread Juergen Gross
Failure of an element of a Xen multicall is signalled via a WARN()
only unless the kernel is compiled with MC_DEBUG. It is impossible to
know which element failed and why it did so.

Change that by printing the related information even without MC_DEBUG,
even if maybe in some limited form (e.g. without information which
caller produced the failing element).

Move the printing out of the switch statement in order to have the
same information for a single call.

Signed-off-by: Juergen Gross 
---
 arch/x86/xen/multicalls.c | 35 ---
 1 file changed, 20 insertions(+), 15 deletions(-)

diff --git a/arch/x86/xen/multicalls.c b/arch/x86/xen/multicalls.c
index 2bce7958ce8b..0766a08bdf45 100644
--- a/arch/x86/xen/multicalls.c
+++ b/arch/x86/xen/multicalls.c
@@ -69,6 +69,11 @@ void xen_mc_flush(void)
 
trace_xen_mc_flush(b->mcidx, b->argidx, b->cbidx);
 
+#if MC_DEBUG
+   memcpy(b->debug, b->entries,
+  b->mcidx * sizeof(struct multicall_entry));
+#endif
+
switch (b->mcidx) {
case 0:
/* no-op */
@@ -87,32 +92,34 @@ void xen_mc_flush(void)
break;
 
default:
-#if MC_DEBUG
-   memcpy(b->debug, b->entries,
-  b->mcidx * sizeof(struct multicall_entry));
-#endif
-
if (HYPERVISOR_multicall(b->entries, b->mcidx) != 0)
BUG();
for (i = 0; i < b->mcidx; i++)
if (b->entries[i].result < 0)
ret++;
+   }
 
+   if (WARN_ON(ret)) {
+   pr_err("%d of %d multicall(s) failed: cpu %d\n",
+  ret, b->mcidx, smp_processor_id());
+   for (i = 0; i < b->mcidx; i++) {
+   if (b->entries[i].result < 0) {
 #if MC_DEBUG
-   if (ret) {
-   printk(KERN_ERR "%d multicall(s) failed: cpu %d\n",
-  ret, smp_processor_id());
-   dump_stack();
-   for (i = 0; i < b->mcidx; i++) {
-   printk(KERN_DEBUG "  call %2d/%d: op=%lu 
arg=[%lx] result=%ld\t%pF\n",
-  i+1, b->mcidx,
+   pr_err("  call %2d: op=%lu arg=[%lx] 
result=%ld\t%pF\n",
+  i + 1,
   b->debug[i].op,
   b->debug[i].args[0],
   b->entries[i].result,
   b->caller[i]);
+#else
+   pr_err("  call %2d: op=%lu arg=[%lx] 
result=%ld\n",
+  i + 1,
+  b->entries[i].op,
+  b->entries[i].args[0],
+  b->entries[i].result);
+#endif
}
}
-#endif
}
 
b->mcidx = 0;
@@ -126,8 +133,6 @@ void xen_mc_flush(void)
b->cbidx = 0;
 
local_irq_restore(flags);
-
-   WARN_ON(ret);
 }
 
 struct multicall_space __xen_mc_entry(size_t args)
-- 
2.16.4



Re: [PATCH 2/4] base/drivers/arch_topology: Replace mutex with READ_ONCE / WRITE_ONCE

2018-11-23 Thread Sudeep Holla
On Fri, Nov 23, 2018 at 05:04:18PM +0100, Daniel Lezcano wrote:
> On 23/11/2018 14:58, Sudeep Holla wrote:
> > On Mon, Oct 29, 2018 at 05:23:18PM +0100, Daniel Lezcano wrote:
> >> The mutex protects a per_cpu variable access. The potential race can
> >> happen only when the cpufreq governor module is loaded and at the same
> >> time the cpu capacity is changed in the sysfs.
> >>
> >
> > I wonder if we really need that sysfs entry to be writable. For some
> > reason, I had assumed it's read only, obviously it's not. I prefer to
> > make it RO if there's no strong reason other than debug purposes.
>
> Are you suggesting to remove the READ_ONCE/WRITE_ONCE patch and set the
> sysfs file read-only ?
>

Just to be sure, if we retain RW capability we still need to fix the
race you are pointing out.

However I just don't see the need for RW cpu_capacity sysfs and hence
asking the reason here. IIRC I had pointed this out long back(not sure
internally or externally) but seemed to have missed the version that got
merged. So I am just asking if we really need write capability given that
it has known issues.

If user-space starts writing the value to influence the scheduler, then
it makes it difficult for kernel to change the way it uses the
cpu_capacity in future.

Sorry if there's valid usecase and I am just making noise here.

--
Regards,
Sudeep


Re: [PATCH 3/3] spi: at91-usart: add DMA support

2018-11-23 Thread Radu Nicolae Pirea
On Wed, 2018-11-21 at 17:38 +, Robin Murphy wrote:
> On 21/11/2018 11:27, Radu Pirea wrote:
> > This patch adds support for DMA. Transfers are done with dma only
> > if
> > they are longer than 16 bytes in order to achieve a better
> > performance.
> > DMA setup introduces a little overhead and for transfers shorter
> > than 16
> > bytes there is no performance improvement.
> > 
> > Signed-off-by: Radu Pirea 
> > ---
> >   drivers/spi/spi-at91-usart.c | 223
> > ++-
> >   1 file changed, 221 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/spi/spi-at91-usart.c b/drivers/spi/spi-at91-
> > usart.c
> > index 0b07c746453d..4d908afeaec9 100644
> > --- a/drivers/spi/spi-at91-usart.c
> > +++ b/drivers/spi/spi-at91-usart.c
> > @@ -8,9 +8,12 @@
> >   
> >   #include 
> >   #include 
> > +#include 
> > +#include 
> 
> It looks rather odd to include this from a driver that isn't
> otherwise 
> touching anything from linux/dma-mapping.h.
> 
> >   #include 
> >   #include 
> >   #include 
> > +#include 
> >   #include 
> >   #include 
> >   #include 
> > @@ -58,6 +61,8 @@
> >   
> >   #define US_INIT \
> > (US_MR_SPI_MASTER | US_MR_CHRL | US_MR_CLKO | US_MR_WRDBT)
> > +#define US_DMA_MIN_BYTES   16
> > +#define US_DMA_TIMEOUT (msecs_to_jiffies(1000))
> >   
> >   /* Register access macros */
> >   #define at91_usart_spi_readl(port, reg) \
> > @@ -71,14 +76,19 @@
> > writeb_relaxed((value), (port)->regs + US_##reg)
> >   
> >   struct at91_usart_spi {
> > +   struct platform_device  *mpdev;
> > struct spi_transfer *current_transfer;
> > void __iomem*regs;
> > struct device   *dev;
> > struct clk  *clk;
> >   
> > +   struct completion   xfer_completion;
> > +
> > /*used in interrupt to protect data reading*/
> > spinlock_t  lock;
> >   
> > +   phys_addr_t phybase;
> > +
> > int irq;
> > unsigned intcurrent_tx_remaining_bytes;
> > unsigned intcurrent_rx_remaining_bytes;
> > @@ -87,8 +97,184 @@ struct at91_usart_spi {
> > u32 status;
> >   
> > boolxfer_failed;
> > +   booluse_dma;
> >   };
> >   
> > +static void dma_callback(void *data)
> > +{
> > +   struct spi_controller   *ctlr = data;
> > +   struct at91_usart_spi   *aus = spi_master_get_devdata(ctlr);
> > +
> > +   at91_usart_spi_writel(aus, IER, US_IR_RXRDY);
> > +   aus->current_rx_remaining_bytes = 0;
> > +   complete(&aus->xfer_completion);
> > +}
> > +
> > +static bool at91_usart_spi_can_dma(struct spi_controller *ctrl,
> > +  struct spi_device *spi,
> > +  struct spi_transfer *xfer)
> > +{
> > +   struct at91_usart_spi *aus = spi_master_get_devdata(ctrl);
> > +
> > +   return aus->use_dma && xfer->len >= US_DMA_MIN_BYTES;
> > +}
> > +
> > +static int at91_usart_spi_configure_dma(struct spi_controller
> > *ctlr,
> > +   struct at91_usart_spi *aus)
> > +{
> > +   struct dma_slave_config slave_config;
> > +   struct device *dev = &aus->mpdev->dev;
> > +   phys_addr_t phybase = aus->phybase;
> > +   dma_cap_mask_t mask;
> > +   int err = 0;
> > +
> > +   dma_cap_zero(mask);
> > +   dma_cap_set(DMA_SLAVE, mask);
> > +
> > +   ctlr->dma_tx = dma_request_slave_channel_reason(dev, "tx");
> > +   if (IS_ERR_OR_NULL(ctlr->dma_tx)) {
> > +   if (IS_ERR(ctlr->dma_tx)) {
> > +   err = PTR_ERR(ctlr->dma_tx);
> > +   goto at91_usart_spi_error_clear;
> > +   }
> > +
> > +   dev_dbg(dev,
> > +   "DMA TX channel not available, SPI unable to
> > use DMA\n");
> > +   err = -EBUSY;
> > +   goto at91_usart_spi_error_clear;
> > +   }
> > +
> > +   ctlr->dma_rx = dma_request_slave_channel_reason(dev, "rx");
> > +   if (IS_ERR_OR_NULL(ctlr->dma_rx)) {
> > +   if (IS_ERR(ctlr->dma_rx)) {
> > +   err = PTR_ERR(ctlr->dma_rx);
> > +   goto at91_usart_spi_error;
> > +   }
> > +
> > +   dev_dbg(dev,
> > +   "DMA RX channel not available, SPI unable to
> > use DMA\n");
> > +   err = -EBUSY;
> > +   goto at91_usart_spi_error;
> > +   }
> > +
> > +   slave_config.dst_addr_width = DMA_SLAVE_BUSWIDTH_1_BYTE;
> > +   slave_config.src_addr_width = DMA_SLAVE_BUSWIDTH_1_BYTE;
> > +   slave_config.dst_addr = (dma_addr_t)phybase + US_THR;
> > +   slave_config.src_addr = (dma_addr_t)phybase + US_RHR;
> > +   slave_config.src_maxburst = 1;
> > +   slave_config.dst_maxburst = 1;
> > +   slave_config.device_fc = false;
> > +
> > +   slave_config.direction = DMA_DEV_TO_MEM;
> > +   if (dmaengine_slave_config(ctlr->dma_rx, &slave_config)) {
> > +   dev_err(&ctlr->dev,
> > +   "failed to configure rx dma channel\n");
> > +   err = -EINVAL;
> >

[PATCH] pinctrl: nuvoton: check for devm_kasprintf() failure

2018-11-23 Thread Nicholas Mc Guire
devm_kasprintf() may return NULL on failure of internal allocation thus
the assignment to  .label  is not safe if not checked. On error
npcm7xx_gpio_of() returns negative values so -ENOMEM in the
(unlikely) failure case of devm_kasprintf() should be fine here.

Signed-off-by: Nicholas Mc Guire 
Fixes: 3b588e43ee5c ("pinctrl: nuvoton: add NPCM7xx pinctrl and GPIO driver")
---

Problem located with experimental coccinelle script

The only call-site is the probe function (npcm7xx_pinctrl_probe()) wich
will bail out on error - it seems like no further cleanup is expected.

Also while the code generally uses the  if (!var)  construct that would
not be very readable in this case so I think the:
 if (pctrl->gpio_bank[id].gc.label == NULL)
is more appropriate here not sure though what the general preference is.

Patch was compile tested with: multi_v7_defconfig + ARCH_NPCM=y,
ARCH_NPCM7XX=y (implies PINCTRL_NPCM7XX=y)
(sparse reports a number of "obsolete array initializer" warnings)

Patch is against 4.20-rc3 (localversion-next is next-20181123)

 drivers/pinctrl/nuvoton/pinctrl-npcm7xx.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/pinctrl/nuvoton/pinctrl-npcm7xx.c 
b/drivers/pinctrl/nuvoton/pinctrl-npcm7xx.c
index b455209..17f909d 100644
--- a/drivers/pinctrl/nuvoton/pinctrl-npcm7xx.c
+++ b/drivers/pinctrl/nuvoton/pinctrl-npcm7xx.c
@@ -1925,6 +1925,9 @@ static int npcm7xx_gpio_of(struct npcm7xx_pinctrl *pctrl)
pctrl->gpio_bank[id].gc.label =
devm_kasprintf(pctrl->dev, GFP_KERNEL, "%pOF",
   np);
+   if (pctrl->gpio_bank[id].gc.label == NULL)
+   return -ENOMEM;
+
pctrl->gpio_bank[id].gc.dbg_show = npcmgpio_dbg_show;
pctrl->gpio_bank[id].direction_input =
pctrl->gpio_bank[id].gc.direction_input;
-- 
2.1.4



Re: [PATCH] dmaengine: ti: omap-dma: Configure LCH_TYPE for OMAP1

2018-11-23 Thread Russell King - ARM Linux
Hi Peter,

Here's the patch, which should now support IN as well as OUT.
Completely untested, as mentioned before.

 drivers/usb/gadget/udc/omap_udc.c | 286 ++
 drivers/usb/gadget/udc/omap_udc.h |   3 +-
 2 files changed, 135 insertions(+), 154 deletions(-)

diff --git a/drivers/usb/gadget/udc/omap_udc.c 
b/drivers/usb/gadget/udc/omap_udc.c
index 3a16431da321..ad6f315e4327 100644
--- a/drivers/usb/gadget/udc/omap_udc.c
+++ b/drivers/usb/gadget/udc/omap_udc.c
@@ -28,6 +28,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -203,7 +204,7 @@ static int omap_ep_enable(struct usb_ep *_ep,
/* set endpoint to initial state */
ep->dma_channel = 0;
ep->has_dma = 0;
-   ep->lch = -1;
+   ep->dma = NULL;
use_ep(ep, UDC_EP_SEL);
omap_writew(udc->clr_halt, UDC_CTRL);
ep->ackwait = 0;
@@ -468,43 +469,6 @@ static int read_fifo(struct omap_ep *ep, struct omap_req 
*req)
 
 /*-*/
 
-static u16 dma_src_len(struct omap_ep *ep, dma_addr_t start)
-{
-   dma_addr_t  end;
-
-   /* IN-DMA needs this on fault/cancel paths, so 15xx misreports
-* the last transfer's bytecount by more than a FIFO's worth.
-*/
-   if (cpu_is_omap15xx())
-   return 0;
-
-   end = omap_get_dma_src_pos(ep->lch);
-   if (end == ep->dma_counter)
-   return 0;
-
-   end |= start & (0x << 16);
-   if (end < start)
-   end += 0x1;
-   return end - start;
-}
-
-static u16 dma_dest_len(struct omap_ep *ep, dma_addr_t start)
-{
-   dma_addr_t  end;
-
-   end = omap_get_dma_dst_pos(ep->lch);
-   if (end == ep->dma_counter)
-   return 0;
-
-   end |= start & (0x << 16);
-   if (cpu_is_omap15xx())
-   end++;
-   if (end < start)
-   end += 0x1;
-   return end - start;
-}
-
-
 /* Each USB transfer request using DMA maps to one or more DMA transfers.
  * When DMA completion isn't request completion, the UDC continues with
  * the next DMA transfer for that USB transfer.
@@ -512,34 +476,53 @@ static u16 dma_dest_len(struct omap_ep *ep, dma_addr_t 
start)
 
 static void next_in_dma(struct omap_ep *ep, struct omap_req *req)
 {
-   u16 txdma_ctrl, w;
-   unsignedlength = req->req.length - req->req.actual;
-   const int   sync_mode = cpu_is_omap15xx()
-   ? OMAP_DMA_SYNC_FRAME
-   : OMAP_DMA_SYNC_ELEMENT;
-   int dma_trigger = 0;
+   struct dma_async_tx_descriptor *tx;
+   struct dma_chan *dma = ep->dma;
+   dma_cookie_t cookie;
+   unsigned burst, length;
+   u16 txdma_ctrl, w;
+   struct dma_slave_config omap_udc_in_cfg = {
+   .direction = DMA_MEM_TO_DEV,
+   .dst_addr = UDC_DATA_DMA,
+   };
+
+   length = req->req.length - req->req.actual;
 
/* measure length in either bytes or packets */
-   if ((cpu_is_omap16xx() && length <= UDC_TXN_TSC)
-   || (cpu_is_omap15xx() && length < ep->maxpacket)) {
+   if ((cpu_is_omap16xx() && length <= UDC_TXN_TSC) ||
+   (cpu_is_omap15xx() && length < ep->maxpacket)) {
+   omap_udc_in_cfg.dst_addr_width = DMA_SLAVE_BUSWIDTH_1_BYTE;
txdma_ctrl = UDC_TXN_EOT | length;
-   omap_set_dma_transfer_params(ep->lch, OMAP_DMA_DATA_TYPE_S8,
-   length, 1, sync_mode, dma_trigger, 0);
+   burst = length;
} else {
-   length = min(length / ep->maxpacket,
-   (unsigned) UDC_TXN_TSC + 1);
+   omap_udc_in_cfg.dst_addr_width = DMA_SLAVE_BUSWIDTH_2_BYTE;
+   length = min_t(unsigned, length / ep->maxpacket,
+  UDC_TXN_TSC + 1);
txdma_ctrl = length;
-   omap_set_dma_transfer_params(ep->lch, OMAP_DMA_DATA_TYPE_S16,
-   ep->ep.maxpacket >> 1, length, sync_mode,
-   dma_trigger, 0);
length *= ep->maxpacket;
+   burst = ep->ep.maxpacket >> 1;
}
-   omap_set_dma_src_params(ep->lch, OMAP_DMA_PORT_EMIFF,
-   OMAP_DMA_AMODE_POST_INC, req->req.dma + req->req.actual,
-   0, 0);
 
-   omap_start_dma(ep->lch);
-   ep->dma_counter = omap_get_dma_src_pos(ep->lch);
+   if (!cpu_is_omap15xx())
+   burst = 1;
+
+   omap_udc_in_cfg.dst_maxburst = burst;
+
+   if (WARN_ON(dmaengine_slave_config(dma, &omap_udc_in_cfg)))
+   return;
+
+   tx = dmaengine_prep_slave_single(dma, req->req.dma + req->req.actual,
+length, DMA_MEM_TO_DEV, 0);
+   if (WARN_ON(!tx))
+   return;
+
+   cookie =

Re: [PATCH] perf map: remove extra indirection from map__find()

2018-11-23 Thread Jiri Olsa
On Fri, Nov 23, 2018 at 02:42:39AM -0800, Eric Saint-Etienne wrote:
> A double pointer is used in map__find() where a single pointer is enough
> because the function doesn't affect the rbtree and the rbtree is locked.
> 
> Signed-off-by: Eric Saint-Etienne 

Acked-by: Jiri Olsa 

thanks,
jirka

> ---
>  tools/perf/util/map.c | 13 ++---
>  1 file changed, 6 insertions(+), 7 deletions(-)
> 
> diff --git a/tools/perf/util/map.c b/tools/perf/util/map.c
> index 354e545..3dac766 100644
> --- a/tools/perf/util/map.c
> +++ b/tools/perf/util/map.c
> @@ -846,19 +846,18 @@ void maps__remove(struct maps *maps, struct map *map)
>  
>  struct map *maps__find(struct maps *maps, u64 ip)
>  {
> - struct rb_node **p, *parent = NULL;
> + struct rb_node *p;
>   struct map *m;
>  
>   down_read(&maps->lock);
>  
> - p = &maps->entries.rb_node;
> - while (*p != NULL) {
> - parent = *p;
> - m = rb_entry(parent, struct map, rb_node);
> + p = maps->entries.rb_node;
> + while (p != NULL) {
> + m = rb_entry(p, struct map, rb_node);
>   if (ip < m->start)
> - p = &(*p)->rb_left;
> + p = p->rb_left;
>   else if (ip >= m->end)
> - p = &(*p)->rb_right;
> + p = p->rb_right;
>   else
>   goto out;
>   }
> -- 
> 1.8.3.1
> 


Re: [PATCH 2/3] dt-bindings: mfd: atmel-usart: add DMA bindings for SPI mode

2018-11-23 Thread Radu Nicolae Pirea
On Wed, 2018-11-21 at 10:41 -0600, Rob Herring wrote:
> On Wed, Nov 21, 2018 at 5:29 AM Radu Pirea  > wrote:
> > The bindings for DMA are now common for both drivers of the USART
> > IP.
> > 
> > The node given as an example for USART in SPI mode has been updated
> > in
> > order to include DMA bindings.
> > 
> > Signed-off-by: Radu Pirea 
> > ---
> >  .../devicetree/bindings/mfd/atmel-usart.txt   | 15 ++-
> > 
> >  1 file changed, 10 insertions(+), 5 deletions(-)
> > 
> > diff --git a/Documentation/devicetree/bindings/mfd/atmel-usart.txt
> > b/Documentation/devicetree/bindings/mfd/atmel-usart.txt
> > index 7f0cd72f47d2..8ad008175343 100644
> > --- a/Documentation/devicetree/bindings/mfd/atmel-usart.txt
> > +++ b/Documentation/devicetree/bindings/mfd/atmel-usart.txt
> > @@ -17,17 +17,19 @@ Required properties for USART in SPI mode:
> >  - cs-gpios: chipselects (internal cs not supported)
> >  - atmel,usart-mode : Must be  (found in dt-
> > bindings/mfd/at91-usart.h)
> > 
> > +Optional properties in serial and SPI mode:
> > +- dma bindings for dma transfer:
> > +   - dmas: DMA specifier, consisting of a phandle to DMA
> > controller node,
> > +   memory peripheral interface and USART DMA channel
> > ID, FIFO configuration.
> > +   Refer to dma.txt and atmel-dma.txt for details.
> > +   - dma-names: "rx" for RX channel, "tx" for TX channel.
> > +   dma-names = "tx", "rx";
> 
> The dma-names should have a defined order.

Thanks Rob.
I will specify in bindings a fixed order for dmas and dma-names.

> 
> Rob



Re: [PATCH v4 2/4] namei: O_BENEATH-style path resolution flags

2018-11-23 Thread Andy Lutomirski
> On Nov 23, 2018, at 5:10 AM, Jürg Billeter  wrote:
>
> Hi Aleksa,
>
>> On Tue, 2018-11-13 at 01:26 +1100, Aleksa Sarai wrote:
>> * O_BENEATH: Disallow "escapes" from the starting point of the
>>  filesystem tree during resolution (you must stay "beneath" the
>>  starting point at all times). Currently this is done by disallowing
>>  ".." and absolute paths (either in the given path or found during
>>  symlink resolution) entirely, as well as all "magic link" jumping.
>
> With open_tree(2) and OPEN_TREE_CLONE, will O_BENEATH still be
> necessary?

This discussion reminds me of something I’m uncomfortable with in the
current patches: currently, most of the O_ flags determine some
property of the returned opened file.  The new O_ flags you're adding
don't -- instead, they affect the lookup of the file.  So O_BENEATH
doesn't return a descriptor that can only be used to loop up files
beneath it -- it just controls whether open(2) succeeds or fails.  It
might be nice for the naming of the flags to reflect this.  I also
don't love that we have some magic AT_ flags that work with some
syscalls and some magic O_ flags that work with others.

In this regard, I liked the AT_ naming better.  Although I don't love
adding AT_ flags because the restrict our ability to usefully use the
high bits of the fd in the future.

--Andy


Re: [PATCH 2/4] base/drivers/arch_topology: Replace mutex with READ_ONCE / WRITE_ONCE

2018-11-23 Thread Daniel Lezcano
On 23/11/2018 14:58, Sudeep Holla wrote:
> On Mon, Oct 29, 2018 at 05:23:18PM +0100, Daniel Lezcano wrote:
>> The mutex protects a per_cpu variable access. The potential race can
>> happen only when the cpufreq governor module is loaded and at the same
>> time the cpu capacity is changed in the sysfs.
>>
> 
> I wonder if we really need that sysfs entry to be writable. For some
> reason, I had assumed it's read only, obviously it's not. I prefer to
> make it RO if there's no strong reason other than debug purposes.

Are you suggesting to remove the READ_ONCE/WRITE_ONCE patch and set the
sysfs file read-only ?


-- 
  Linaro.org │ Open source software for ARM SoCs

Follow Linaro:   Facebook |
 Twitter |
 Blog



Re: [PATCH] perf symbols: Cannot disassemble some routines when debuginfo present

2018-11-23 Thread Jiri Olsa
On Fri, Nov 23, 2018 at 02:25:26AM -0800, Eric Saint-Etienne wrote:
> When the kernel is compiled with -ffunction-sections and perf uses the
> kernel debuginfo, perf fails the very first symbol lookup and ends up with
> an hex offset inside [kernel.vmlinux]. It's due to how perf loads the maps.
> 
> Indeed only .text gets loaded by map_groups__find() into al->map.
> Consequently al->map address range encompass the whole code.
> But map__load() has just loaded many function maps by splitting al->map,
> which reduced al->map range drastically. Very likely the target address is
> now in one of those newly created function maps, so we need to lookup the
> map again to find that new map.
> 
> This issue is not specific to the kernel but to how the image is linked.
> For the kernel, when we're not using the kernel debuginfo, perf will
> fallback to using kallsyms and then the first lookup will work.
> 
> This patch makes sure that the event address we're looking-up is indeed
> within the map we've found, otherwise we lookup another map again.
> Only one extra lookup at most is required for the proper map to be found,
> if it exists.
> 
> Signed-off-by: Eric Saint-Etienne 
> Reviewed-by: Darren Kenny 
> ---
>  tools/perf/util/event.c | 23 ++-
>  1 file changed, 22 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c
> index e9c108a..a69ef52 100644
> --- a/tools/perf/util/event.c
> +++ b/tools/perf/util/event.c
> @@ -1571,7 +1571,28 @@ struct map *thread__find_map(struct thread *thread, u8 
> cpumode, u64 addr,
>*/
>   if (load_map)
>   map__load(al->map);
> - al->addr = al->map->map_ip(al->map, al->addr);
> +
> + /*
> +  * When using -ffunction-sections, only .text gets loaded by
> +  * map_groups__find() into al->map. Consequently al->map address
> +  * range encompass the whole code.
> +  *
> +  * But map__load() has just loaded many function maps by
> +  * splitting al->map, which reduced al->map range drastically.
> +  * Very likely the target address is now in one of those newly
> +  * created function maps, so we need to lookup the map again
> +  * to find that new map.
> +  */

hum, so map__load actualy can split the map to create new maps?

cold you please point me to that code? I haven't touch
this area for some time and I can't find it

thanks,
jirka

> + if (al->addr < al->map->start || al->addr >= al->map->end)
> + al->map = map_groups__find(mg, al->addr);
> +
> + /*
> +  * The new map *ought* to exist because the initial al->map
> +  * contained that address and subsequently has been split into
> +  * many *contiguous* maps.
> +  */
> + if (al->map != NULL)
> + al->addr = al->map->map_ip(al->map, al->addr);
>   }
>  
>   return al->map;
> -- 
> 1.8.3.1
> 


Re: [PATCH 1/2] irq/irq_sim: provide irq_sim_fire_edge()

2018-11-23 Thread Bartosz Golaszewski
śr., 21 lis 2018 o 20:15 Uwe Kleine-König
 napisał(a):
>
> Hello Bartosz,
>
> On Wed, Nov 21, 2018 at 05:34:32PM +0100, Bartosz Golaszewski wrote:
> > wt., 20 lis 2018 o 18:17 Uwe Kleine-König
> >  napisał(a):
> > >
> > > On Tue, Nov 20, 2018 at 02:40:31PM +0100, Bartosz Golaszewski wrote:
> > > > From: Bartosz Golaszewski 
> > > >
> > > > The irq_sim irqchip doesn't allow to configure the sensitivity so every
> > > > call to irq_sim_fire() fires a dummy interrupt. This used to not matter
> > > > for gpio-mockup (one of the irq_sim users) until commit fa38869b0161
> > > > ("gpiolib: Don't support irq sharing for userspace") which made it
> > > > impossible for gpio-mockup to ignore certain events (e.g. only receive
> > > > notifications about rising edge events).
> > > >
> > > > Introduce a specialized variant of irq_sim_fire() which takes another
> > > > argument called edge. allowing to specify the trigger type for the
> > > > dummy interrupt.
> > >
> > > I wonder if it's worth the effort to fix irq_sim. If you take a look in
> > > my gpio-simulator patch, it is trivial to get it right without external
> > > help with an amount of code that is usual for a driver that handles
> > > irqs.
> >
> > You're basically recommending handcrafting another local piece of code
> > for simulating interrupts - something that multiple users may be
> > interested in. You did that in your proposed gpio-simulator and I
> > still can't understand why you couldn't reuse the existing solution.
> > Even if it's broken for your use-case, it's surely easier to fix it
> > than to rewrite and duplicate it? There are very few cases where code
> > consolidation is not a good thing and I don't think this is one of
> > them.
>
> I don't say that factoring out common stuff is bad. But if in the end
> you call
>
> irq_sim_something(some, parameters, offset);
>
> with the simulator and if you don't use the irq simulator you do
>
> irq = irq_find_mapping(irqdomain, offset);
> generic_handle_irq(irq);
>
> I prefer the latter because it's only a single additional line and in
> return it's more obvious what it does because it's the same that many
> other drivers (for real hardware) also do.
>

I'm not sure I'm following you. You still need to add ~150 LOC for the
gpio_simulator_irqtrigger() worker and gpio_simulator_irq_*() routines
locally as you did in your gpio-simulator patch. A generic simulator +
using the irq_work saves you that.

Bart


Re: [PATCHv3 1/3] x86/mm: Move LDT remap out of KASLR region on 5-level paging

2018-11-23 Thread Kirill A. Shutemov
On Sat, Nov 10, 2018 at 08:29:05PM +0800, Baoquan He wrote:
> > diff --git a/Documentation/x86/x86_64/mm.txt 
> > b/Documentation/x86/x86_64/mm.txt
> > index 702898633b00..75bff98928a8 100644
> > --- a/Documentation/x86/x86_64/mm.txt
> > +++ b/Documentation/x86/x86_64/mm.txt
> > @@ -34,23 +34,24 @@ 
> > __||__|_|___
> >  
> > |___
> >||  | |
> >   8000 | -128TB | 87ff |8 TB | ... guard 
> > hole, also reserved for hypervisor
> > - 8800 | -120TB | c7ff |   64 TB | direct 
> > mapping of all physical memory (page_offset_base)
> > - c800 |  -56TB | c8ff |1 TB | ... unused 
> > hole
> > + 8800 | -120TB | 887f |  0.5 TB | LDT remap 
> > for PTI
> > + 8880 | -119.5  TB | c87f |   64 TB | direct 
> > mapping of all physical memory (page_offset_base)
> > + c880 |  -55.5  TB | c8ff |  0.5 TB | ... unused 
> > hole
> 
> Hi Kirill,
> 
> Thanks for this fix. One small concern is whether we can put LDT
> remap in other place, e.g shrink KASAN area and save one pgd size for
> it, Just from Redhat's enterprise relase point of view, we don't
> enable CONFIG_KASAN, and LDT is rarely used for server, now cutting one
> block from the direct mapping area and moving it up one pgd slot seems a
> little too abrupt. Does KASAN really cost 16 TB in 4-level and 8 PB in
> 5-level? After all the direct mapping is the core mapping and has been
> there always, LDT remap is kind of not so core and important mapping.
> Just a very perceptual feeling.

Sorry for late reply.

KASAN requires one byte of shadow memory per 8 bytes of target memory, so,
yeah, we need 16 TiB of virtual address space with 4-level paging.

With 5-level, we might save some address space as the limit for physical
address space if 52-bit, not 55. I dedicated 55-bit address space because
it was easier: just scale 4-level layout by factor of 9 and you'll get all
nicely aligned without much thought (PGD translates to PGD, etc).

There is also complication with KASAN layout. We have to have the same
KASAN_SHADOW_OFFSET between 4- and 5-level paging to make boot time
switching between paging modes work. The offset cannot be changed at
runtime: it used as parameter to compiler. That's the reason KASAN area
alignment looks strange.

A possibly better solution would be to actually include LDT in KASLR:
randomize the area along with direct mapping, vmalloc and vmemmap.
But it's more complexity than I found reasonable for a fix.

Do you want to try this? :)

-- 
 Kirill A. Shutemov


Re: [PATCH 3/3] lockdep: Use line-buffered printk() for lockdep messages.

2018-11-23 Thread Steven Rostedt
On Fri, 23 Nov 2018 13:46:47 +0100
Petr Mladek  wrote:

> Steven told me on Plumbers conference that even few initial
> characters saved him a day few times.

Yes, and that has happened more than once. I would reboot and retest
code that is crashing, and due to a triple fault, the machine would
reboot because of some race, and the little output I get from the
console would help tremendously.

Remember, debugging the kernel is a lot like forensics, especially when
it's from a customer's site. You look at all the evidence that you can
get, and sometimes it's just 10 characters in the output that gives you
an idea of where things went wrong. I'm really not liking the buffering
idea because of this.

-- Steve


Re: [PATCH 2/3] ntb_hw_switchtec: Added support of >=4G memory windows

2018-11-23 Thread kbuild test robot
Hi Paul,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on linus/master]
[also build test WARNING on v4.20-rc3 next-20181123]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Wesley-Sheng/ntb_hw_switchtec-Added-support-of-4G-memory-windows/20181123-231700
config: i386-randconfig-x077-201846 (attached as .config)
compiler: gcc-7 (Debian 7.3.0-1) 7.3.0
reproduce:
# save the attached .config to linux build tree
make ARCH=i386 

All warnings (new ones prefixed by >>):

   drivers/ntb/hw/mscc/ntb_hw_switchtec.c: In function 
'switchtec_ntb_mw_set_direct':
>> drivers/ntb/hw/mscc/ntb_hw_switchtec.c:292:17: warning: right shift count >= 
>> width of type [-Wshift-count-overflow]
 iowrite32(size >> 32, &ctl->bar_ext_entry[bar].win_size);
^~
   drivers/ntb/hw/mscc/ntb_hw_switchtec.c: In function 'crosslink_setup_mws':
   drivers/ntb/hw/mscc/ntb_hw_switchtec.c:1061:18: warning: right shift count 
>= width of type [-Wshift-count-overflow]
  iowrite32(size >> 32, &ctl->bar_ext_entry[bar].win_size);
 ^~

vim +292 drivers/ntb/hw/mscc/ntb_hw_switchtec.c

   277  
   278  static void switchtec_ntb_mw_set_direct(struct switchtec_ntb *sndev, 
int idx,
   279  dma_addr_t addr, 
resource_size_t size)
   280  {
   281  int xlate_pos = ilog2(size);
   282  int bar = sndev->peer_direct_mw_to_bar[idx];
   283  struct ntb_ctrl_regs __iomem *ctl = sndev->mmio_peer_ctrl;
   284  u32 ctl_val;
   285  
   286  ctl_val = ioread32(&ctl->bar_entry[bar].ctl);
   287  ctl_val |= NTB_CTRL_BAR_DIR_WIN_EN;
   288  
   289  iowrite32(ctl_val, &ctl->bar_entry[bar].ctl);
   290  iowrite32(xlate_pos | (size & 0xF000),
   291&ctl->bar_entry[bar].win_size);
 > 292  iowrite32(size >> 32, &ctl->bar_ext_entry[bar].win_size);
   293  iowrite64(sndev->self_partition | addr,
   294&ctl->bar_entry[bar].xlate_addr);
   295  }
   296  

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip


Re: [PATCH v2] kmemleak: Turn kmemleak_lock to raw spinlock on RT

2018-11-23 Thread Steven Rostedt
On Fri, 23 Nov 2018 11:31:31 +
Catalin Marinas  wrote:

> With qwrlocks, the readers will normally block if there is a pending
> writer (to avoid starving the writer), unless in_interrupt() when the
> readers are allowed to starve a pending writer.
> 
> TLA+/PlusCal model here:  ;)
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/cmarinas/kernel-tla.git/tree/qrwlock.tla
> 


And the code appears to confirm it too:

void queued_read_lock_slowpath(struct qrwlock *lock)
{
/*
 * Readers come here when they cannot get the lock without waiting
 */
if (unlikely(in_interrupt())) {
/*
 * Readers in interrupt context will get the lock immediately
 * if the writer is just waiting (not holding the lock yet),
 * so spin with ACQUIRE semantics until the lock is available
 * without waiting in the queue.
 */
atomic_cond_read_acquire(&lock->cnts, !(VAL & _QW_LOCKED));
return;
}
atomic_sub(_QR_BIAS, &lock->cnts);

/*
 * Put the reader into the wait queue
 */
arch_spin_lock(&lock->wait_lock);
atomic_add(_QR_BIAS, &lock->cnts);



-- Steve


Re: [RFC PATCH 3/3] mm, proc: report PR_SET_THP_DISABLE in proc

2018-11-23 Thread Vlastimil Babka
On 11/20/18 11:35 AM, Michal Hocko wrote:
> From: Michal Hocko 
> 
> David Rientjes has reported that 1860033237d4 ("mm: make
> PR_SET_THP_DISABLE immediately active") has changed the way how
> we report THPable VMAs to the userspace. Their monitoring tool is
> triggering false alarms on PR_SET_THP_DISABLE tasks because it considers
> an insufficient THP usage as a memory fragmentation resp. memory
> pressure issue.
> 
> Before the said commit each newly created VMA inherited VM_NOHUGEPAGE
> flag and that got exposed to the userspace via /proc//smaps file.
> This implementation had its downsides as explained in the commit message
> but it is true that the userspace doesn't have any means to query for
> the process wide THP enabled/disabled status.
> 
> PR_SET_THP_DISABLE is a process wide flag so it makes a lot of sense
> to export in the process wide context rather than per-vma. Introduce

Agreed.

> a new field to /proc//status which export this status.  If
> PR_SET_THP_DISABLE is used then it reports false same as when the THP is
> not compiled in. It doesn't consider the global THP status because we
> already export that information via sysfs
> 
> Fixes: 1860033237d4 ("mm: make PR_SET_THP_DISABLE immediately active")
> Signed-off-by: Michal Hocko 

Acked-by: Vlastimil Babka 


Re: [PATCH] locking/atomics: build atomic headers as required

2018-11-23 Thread Mark Rutland
On Fri, Nov 23, 2018 at 03:33:21PM +, Mark Rutland wrote:
> Andrew and Ingo report that the check-atomics.sh script is simply too
> slow to run for every kernel build, and it's impractical to make it
> faster without rewriting it in something other than shell.
> 
> Rather than committing the generated headers, let's regenerate these
> as-required for a pristine tree.

> The diffstat looks nice, at least...

>  include/asm-generic/atomic-instrumented.h | 1787 --
>  include/asm-generic/atomic-long.h | 1012 -
>  include/linux/atomic-fallback.h   | 2294 
> -

As a heads-up, I generated the diff with git format-patch -D, to omit
the preimage for those ~5000 lines being deleted. Unfortunately, git-am
will refuse to apply that, with something like:

| error: removal patch leaves file contents
| error: include/asm-generic/atomic-instrumented.h: patch does not apply

I've pushed the patch to my atomics/regenerate branch [1] for now. I can resend
with the full context (or leaving the stale files for a subsequent deletion),
if that's preferable?

Thanks,
Mark.

[1] git://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git 
atomics/regenerate


[PATCH 2/2] serial: mvebu-uart: initialize over sampling stack register

2018-11-23 Thread Miquel Raynal
The baudrate derivation relies on the state of the programmable over
sampling stack (OSAMP register) being empty, while never initializing
it.

Set all the fields of this register to 0 (except reserved areas) to
ensure a x16 divisor, as assumed by the driver.

The suspend/resume callbacks are untouched because they already
save/restore correctly this register.

Suggested-by: Thomas Petazzoni 
Signed-off-by: Miquel Raynal 
---
 drivers/tty/serial/mvebu-uart.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/tty/serial/mvebu-uart.c b/drivers/tty/serial/mvebu-uart.c
index df6e4d8cdbd6..231f751d1ef4 100644
--- a/drivers/tty/serial/mvebu-uart.c
+++ b/drivers/tty/serial/mvebu-uart.c
@@ -73,6 +73,7 @@
 
 #define UART_OSAMP 0x14
 #define  OSAMP_DEFAULT_DIVISOR 16
+#define  OSAMP_DIVISORS_MASK   0x3F3F3F3F
 
 #define MVEBU_NR_UARTS 2
 
@@ -446,7 +447,7 @@ static int mvebu_uart_baud_rate_set(struct uart_port *port, 
unsigned int baud)
 {
struct mvebu_uart *mvuart = to_mvuart(port);
unsigned int d_divisor, m_divisor;
-   u32 brdv;
+   u32 brdv, osamp;
 
if (IS_ERR(mvuart->clk))
return -PTR_ERR(mvuart->clk);
@@ -469,6 +470,10 @@ static int mvebu_uart_baud_rate_set(struct uart_port 
*port, unsigned int baud)
brdv |= d_divisor;
writel(brdv, port->membase + UART_BRDV);
 
+   osamp = readl(port->membase + UART_OSAMP);
+   osamp &= ~OSAMP_DIVISORS_MASK;
+   writel(osamp, port->membase + UART_OSAMP);
+
return 0;
 }
 
-- 
2.19.1



[PATCH 1/2] serial: mvebu-uart: clarify the baud rate derivation

2018-11-23 Thread Miquel Raynal
The current comment in ->set_baud_rate() is rather incomplete as it
fails to describe what are the actual stages for the baudrate
derivation. Replace this comment with something more explicit and
close to the functional specification. Also adapt the variable names
to it.

Signed-off-by: Miquel Raynal 
---
 drivers/tty/serial/mvebu-uart.c | 22 ++
 1 file changed, 14 insertions(+), 8 deletions(-)

diff --git a/drivers/tty/serial/mvebu-uart.c b/drivers/tty/serial/mvebu-uart.c
index 170e446a2f62..df6e4d8cdbd6 100644
--- a/drivers/tty/serial/mvebu-uart.c
+++ b/drivers/tty/serial/mvebu-uart.c
@@ -72,6 +72,7 @@
 #define  BRDV_BAUD_MASK 0x3FF
 
 #define UART_OSAMP 0x14
+#define  OSAMP_DEFAULT_DIVISOR 16
 
 #define MVEBU_NR_UARTS 2
 
@@ -444,23 +445,28 @@ static void mvebu_uart_shutdown(struct uart_port *port)
 static int mvebu_uart_baud_rate_set(struct uart_port *port, unsigned int baud)
 {
struct mvebu_uart *mvuart = to_mvuart(port);
-   unsigned int baud_rate_div;
+   unsigned int d_divisor, m_divisor;
u32 brdv;
 
if (IS_ERR(mvuart->clk))
return -PTR_ERR(mvuart->clk);
 
/*
-* The UART clock is divided by the value of the divisor to generate
-* UCLK_OUT clock, which is 16 times faster than the baudrate.
-* This prescaler can achieve all standard baudrates until 230400.
-* Higher baudrates could be achieved for the extended UART by using the
-* programmable oversampling stack (also called fractional divisor).
+* The baudrate is derived from the UART clock thanks to two divisors:
+*   > D ("baud generator"): can divide the clock from 2 to 2^10 - 1.
+*   > M ("fractional divisor"): allows a better accuracy for
+* baudrates higher than 230400.
+*
+* As the derivation of M is rather complicated, the code sticks to its
+* default value (x16) when all the prescalers are zeroed, and only
+* makes use of D to configure the desired baudrate.
 */
-   baud_rate_div = DIV_ROUND_UP(port->uartclk, baud * 16);
+   m_divisor = OSAMP_DEFAULT_DIVISOR;
+   d_divisor = DIV_ROUND_UP(port->uartclk, baud * m_divisor);
+
brdv = readl(port->membase + UART_BRDV);
brdv &= ~BRDV_BAUD_MASK;
-   brdv |= baud_rate_div;
+   brdv |= d_divisor;
writel(brdv, port->membase + UART_BRDV);
 
return 0;
-- 
2.19.1



RE: [PATCH 4.19 00/42] 4.19.4-stable review

2018-11-23 Thread David Laight
From: Thomas Voegtle
> Sent: 23 November 2018 15:41
> On Fri, 23 Nov 2018, Greg Kroah-Hartman wrote:
> 
> > On Thu, Nov 22, 2018 at 09:53:35PM +0100, Thomas Voegtle wrote:
> >>
> >> Doesn't compile for me on OpenSuSE 15.0 (gcc 7.3.1):
> >>
> >>   CALLscripts/checksyscalls.sh
> >>   DESCEND  objtool
> >>   CHK include/generated/compile.h
> >>   CC [M]  drivers/gpu/drm/i915/i915_gem_gtt.o
> >> drivers/gpu/drm/i915/i915_gem_gtt.c: In function ‘gen6_dump_ppgtt’:
> >> drivers/gpu/drm/i915/i915_gem_gtt.c:1771:41: error: format ‘%llx’ expects
> >> argument of type ‘long long unsigned int’, but argument 5 has type ‘long
> >> unsigned int’ [-Werror=format=]
> >> seq_printf(m, "\t\t(%03d, %04d) %08llx: ",
> >> ~^
> >> %08lx
> >> cc1: all warnings being treated as errors
> >
> > Warnings treated as errors?  That's rough, sorry, don't do that :)
> 
> I had CONFIG_DRM_I915_WERROR=y for whatever reason.
> The whole thing confused me a lot.
> Sorry.

Well, if you compile that code for a 32bit architecture it is likely
to print garbage.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, 
UK
Registration No: 1397386 (Wales)


Re: [PATCH] dmaengine: ti: omap-dma: Configure LCH_TYPE for OMAP1

2018-11-23 Thread Russell King - ARM Linux
On Fri, Nov 23, 2018 at 02:35:04PM +0200, Peter Ujfalusi wrote:
> > Also we can't deal with the omap_set_dma_dest_burst_mode() setting -
> > DMAengine always uses a 64 byte burst, but udc wants a smaller burst
> > setting.  Does this matter?
> 
> The tusb also fiddled with the burst before the conversion, I believe
> what the DMAengine driver is doing should be fine. If not then we fix it
> while converting the omap_udc.

That's good news, so I can ignore that difference.

> > I'm also not sure about this:
> > 
> > if (cpu_is_omap15xx())
> > end++;
> > 
> > in dma_dest_len() - is that missing from the omap-dma driver?  It looks
> > like a work-around for some problem on OMAP15xx, but I can't make sense
> > about why it's in the UDC driver rather than the legacy DMA driver.
> 
> afaik no other legacy drivers were doing similar thing, this must be
> something which is needed for the omap_udc driver to fix up something?
> 
> > 
> > I'm also confused by:
> > 
> > end |= start & (0x << 16);
> > 
> > also in dma_dest_len() - omap_get_dma_dst_pos() returns in the high 16
> > bits the full address:
> > 
> > if (dma_omap1())
> > offset |= (p->dma_read(CDSA, lch) & 0x);
> 
> CDSA is OMAP_DMA_REG_2X16BIT for omap1
> The CPC/CDAC holds the LSB of the _current_ DMA pointer. The code gets
> the MSB of the address from the CDSA registers.
> 
> > 
> > so if the address crosses a 64k physical address boundary, surely orring
> > in the start address is wrong?  The more I look at dma_dest_len(), the
> > more I wonder whether that and dma_src_len() are anywhere near correct,
> > and whether that is a source of breakage for Aaro.
> 
> Hrm, again... the position reporting on OMAP1 is certainly not something
> I would put my life on :o

My feeling is - if the code in plat-omap/dma.c doesn't work, we've got
the same problems in the dmaengine driver, so the reported residue will
be wrong.  Any workarounds need to be within the dmaengine driver, not
in individual drivers.  We can't just go subtracting 1 from the residue
reported by dmaengine.

> > diff --git a/drivers/usb/gadget/udc/omap_udc.c 
> > b/drivers/usb/gadget/udc/omap_udc.c
> > index 3a16431da321..a37e1d2f0f3e 100644
> > --- a/drivers/usb/gadget/udc/omap_udc.c
> > +++ b/drivers/usb/gadget/udc/omap_udc.c
> > @@ -204,6 +204,7 @@ static int omap_ep_enable(struct usb_ep *_ep,
> > ep->dma_channel = 0;
> > ep->has_dma = 0;
> > ep->lch = -1;
> > +   ep->dma = NULL;
> > use_ep(ep, UDC_EP_SEL);
> > omap_writew(udc->clr_halt, UDC_CTRL);
> > ep->ackwait = 0;
> > @@ -576,21 +577,49 @@ static void finish_in_dma(struct omap_ep *ep, struct 
> > omap_req *req, int status)
> >  static void next_out_dma(struct omap_ep *ep, struct omap_req *req)
> >  {
> > unsigned packets = req->req.length - req->req.actual;
> > -   int dma_trigger = 0;
> > +   struct dma_async_tx_descriptor *tx;
> > +   struct dma_chan *dma = ep->dma;
> > +   dma_cookie_t cookie;
> > u16 w;
> >  
> > -   /* set up this DMA transfer, enable the fifo, start */
> > -   packets /= ep->ep.maxpacket;
> > -   packets = min(packets, (unsigned)UDC_RXN_TC + 1);
> > +   packets = min_t(unsigned, packets / ep->ep.maxpacket, UDC_RXN_TC + 1);
> > req->dma_bytes = packets * ep->ep.maxpacket;
> > -   omap_set_dma_transfer_params(ep->lch, OMAP_DMA_DATA_TYPE_S16,
> > -   ep->ep.maxpacket >> 1, packets,
> > -   OMAP_DMA_SYNC_ELEMENT,
> > -   dma_trigger, 0);
> > -   omap_set_dma_dest_params(ep->lch, OMAP_DMA_PORT_EMIFF,
> > -   OMAP_DMA_AMODE_POST_INC, req->req.dma + req->req.actual,
> > -   0, 0);
> > -   ep->dma_counter = omap_get_dma_dst_pos(ep->lch);
> > +
> > +   if (dma) {
> > +   struct dma_slave_config cfg = {
> > +   .direction = DMA_DEV_TO_MEM,
> > +   .src_addr_width = DMA_SLAVE_BUSWIDTH_2_BYTE,
> > +   /*
> > +* DMAengine uses frame sync mode, setting maxburst=1
> > +* is equivalent to element sync mode.
> > +*/
> > +   .src_maxburst = 1,
> 
> We should fix the omap-dma driver for slave_sg  instead:
> 
> if (!burst)
>   burst = 1;
> 
> I thought that I already did that.

It isn't in 4.19, and I see no changes between 4.19 and 4.20-rc for
ti/omap-dma.c.

> > +   struct dma_chan *dma;
> > +
> > dma_channel = OMAP_DMA_USB_W2FC_RX0 - 1 + channel;
> > -   status = omap_request_dma(dma_channel,
> > -   ep->ep.name, dma_error, ep, &ep->lch);
> > -   if (status == 0) {
> > +
> > +   dma = __dma_request_channel(&mask, omap_dma_filter_fn,
> > +   (void *)dma_channel);
> 
> dma_request_chan(dev, "ch_name");
> 
> where ch_name: rx0/1/2, tx0/1/2
> 
> and we don't need the omap_dma_filter_fn in here as all taken care via
> the dma_slave_map

Yea, we 

Re: [PATCH net-next 3/3] vhost: don't touch avail ring if in_order is negotiated

2018-11-23 Thread Michael S. Tsirkin
On Fri, Nov 23, 2018 at 11:00:16AM +0800, Jason Wang wrote:
> Device use descriptors table in order, so there's no need to read
> index from available ring. This eliminate the cache contention on
> avail ring completely.

Well this isn't what the in order feature says in the spec.

It forces the used ring to be in the same order as
the available ring. So I don't think you can skip
checking the available ring. And in fact depending on
ring size and workload, using all of descriptor buffer might
cause a slowdown.
Rather you should be able to get
about the same speedup, but from skipping checking
the used ring in virtio.


> Virito-user + vhost_kernel + XDP_DROP gives about ~10% improvement on
> TX from 4.8Mpps to 5.3Mpps on Intel(R) Core(TM) i7-5600U CPU @
> 2.60GHz.
> 
> Signed-off-by: Jason Wang 
> ---
>  drivers/vhost/vhost.c | 19 ---
>  1 file changed, 12 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index 3a5f81a66d34..c8be151bc897 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -2002,6 +2002,7 @@ int vhost_get_vq_desc(struct vhost_virtqueue *vq,
>   __virtio16 avail_idx;
>   __virtio16 ring_head;
>   int ret, access;
> + bool in_order = vhost_has_feature(vq, VIRTIO_F_IN_ORDER);
>  
>   /* Check it isn't doing very strange things with descriptor numbers. */
>   last_avail_idx = vq->last_avail_idx;
> @@ -2034,15 +2035,19 @@ int vhost_get_vq_desc(struct vhost_virtqueue *vq,
>  
>   /* Grab the next descriptor number they're advertising, and increment
>* the index we've seen. */
> - if (unlikely(vhost_get_avail(vq, ring_head,
> -  &vq->avail->ring[last_avail_idx & (vq->num - 1)]))) {
> - vq_err(vq, "Failed to read head: idx %d address %p\n",
> -last_avail_idx,
> -&vq->avail->ring[last_avail_idx % vq->num]);
> - return -EFAULT;
> + if (!in_order) {
> + if (unlikely(vhost_get_avail(vq, ring_head,
> + &vq->avail->ring[last_avail_idx & (vq->num - 1)]))) {
> + vq_err(vq, "Failed to read head: idx %d address %p\n",
> + last_avail_idx,
> + &vq->avail->ring[last_avail_idx % vq->num]);
> + return -EFAULT;
> + }
> + head = vhost16_to_cpu(vq, ring_head);
> + } else {
> + head = last_avail_idx & (vq->num - 1);
>   }
>  
> - head = vhost16_to_cpu(vq, ring_head);
>  
>   /* If their number is silly, that's an error. */
>   if (unlikely(head >= vq->num)) {
> -- 
> 2.17.1


Re: [PATCH 4.19 00/42] 4.19.4-stable review

2018-11-23 Thread Thomas Voegtle

On Fri, 23 Nov 2018, Greg Kroah-Hartman wrote:


On Thu, Nov 22, 2018 at 09:53:35PM +0100, Thomas Voegtle wrote:


Doesn't compile for me on OpenSuSE 15.0 (gcc 7.3.1):

  CALLscripts/checksyscalls.sh
  DESCEND  objtool
  CHK include/generated/compile.h
  CC [M]  drivers/gpu/drm/i915/i915_gem_gtt.o
drivers/gpu/drm/i915/i915_gem_gtt.c: In function ‘gen6_dump_ppgtt’:
drivers/gpu/drm/i915/i915_gem_gtt.c:1771:41: error: format ‘%llx’ expects
argument of type ‘long long unsigned int’, but argument 5 has type ‘long
unsigned int’ [-Werror=format=]
seq_printf(m, "\t\t(%03d, %04d) %08llx: ",
~^
%08lx
cc1: all warnings being treated as errors


Warnings treated as errors?  That's rough, sorry, don't do that :)


I had CONFIG_DRM_I915_WERROR=y for whatever reason. 
The whole thing confused me a lot.

Sorry.

  Thomas


Re: [PATCH 09/17] irqchip: bcm283x: Switch to SPDX identifier

2018-11-23 Thread Marc Zyngier
On 23/11/2018 15:23, Stefan Wahren wrote:
> Hi Jason,
> hi Marc,
> 
> Am 10.11.18 um 16:54 schrieb Stefan Wahren:
>> Adopt the SPDX license identifier headers to ease license compliance
>> management.
>>
>> Cc: Simon Arlott 
>> Cc: Eric Anholt 
>> Signed-off-by: Stefan Wahren 
>> ---
>>  drivers/irqchip/irq-bcm2835.c | 11 +--
>>  drivers/irqchip/irq-bcm2836.c | 11 +--
>>  2 files changed, 2 insertions(+), 20 deletions(-)
>>
>> diff --git a/drivers/irqchip/irq-bcm2835.c b/drivers/irqchip/irq-bcm2835.c
>> index d2da8a1..418245d 100644
>> --- a/drivers/irqchip/irq-bcm2835.c
>> +++ b/drivers/irqchip/irq-bcm2835.c
>> @@ -1,17 +1,8 @@
>> +// SPDX-License-Identifier: GPL-2.0+
>>  /*
>>   * Copyright 2010 Broadcom
>>   * Copyright 2012 Simon Arlott, Chris Boot, Stephen Warren
>>   *
>> - * This program is free software; you can redistribute it and/or modify
>> - * it under the terms of the GNU General Public License as published by
>> - * the Free Software Foundation; either version 2 of the License, or
>> - * (at your option) any later version.
>> - *
>> - * This program is distributed in the hope that it will be useful,
>> - * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> - * GNU General Public License for more details.
>> - *
>>   * Quirk 1: Shortcut interrupts don't set the bank 1/2 register pending bits
>>   *
>>   * If an interrupt fires on bank 1 that isn't in the shortcuts list, bit 8
>> diff --git a/drivers/irqchip/irq-bcm2836.c b/drivers/irqchip/irq-bcm2836.c
>> index dfe4a46..2038693 100644
>> --- a/drivers/irqchip/irq-bcm2836.c
>> +++ b/drivers/irqchip/irq-bcm2836.c
>> @@ -1,17 +1,8 @@
>> +// SPDX-License-Identifier: GPL-2.0+
>>  /*
>>   * Root interrupt controller for the BCM2836 (Raspberry Pi 2).
>>   *
>>   * Copyright 2015 Broadcom
>> - *
>> - * This program is free software; you can redistribute it and/or modify
>> - * it under the terms of the GNU General Public License as published by
>> - * the Free Software Foundation; either version 2 of the License, or
>> - * (at your option) any later version.
>> - *
>> - * This program is distributed in the hope that it will be useful,
>> - * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> - * GNU General Public License for more details.
>>   */
>>  
>>  #include 
> 
> Eric already made a review. Are you fine with it?

Sure. How do you want this to go in? If you have a series repainting
multiple subsystems, I'd rather it stays as such, and you can stick my

Acked-by: Marc Zyngier 

on it.

Thanks,

M.
-- 
Jazz is not dead. It just smells funny...


[PATCH] locking/atomics: build atomic headers as required

2018-11-23 Thread Mark Rutland
Andrew and Ingo report that the check-atomics.sh script is simply too
slow to run for every kernel build, and it's impractical to make it
faster without rewriting it in something other than shell.

Rather than committing the generated headers, let's regenerate these
as-required for a pristine tree.

That ensures they're always up-to-date, allows them to be built in
parallel, and avoid redundant rebuilds, which is a 2-8s saving per
incremental build. Since the results are not committed, it's very
obvious that they should not be modified directly. If we need to
generate more headers in future, it's easy to extend Makefile.genheader
to permit this.

I've verified that this works in the cases we previously had issues with
(out-of-tree builds and where scripts have no execute permissions), and
have tested these cases for both x86_64 and arm64.

The diffstat looks nice, at least...

Signed-off-by: Mark Rutland 
Cc: Andrew Morton 
Cc: Boqun Feng 
Cc: Borislav Petkov 
Cc: Ingo Molnar 
Cc: Peter Zijlstra 
Cc: Will Deacon 
Cc: linux-kernel@vger.kernel.org
---
 Kbuild|   18 +-
 Makefile  |8 +-
 arch/arm64/include/asm/atomic.h   |2 +-
 arch/x86/include/asm/atomic.h |2 +-
 include/asm-generic/atomic-instrumented.h | 1787 --
 include/asm-generic/atomic-long.h | 1012 -
 include/linux/atomic-fallback.h   | 2294 -
 include/linux/atomic.h|4 +-
 scripts/Makefile.genheader|   23 +
 scripts/atomic/check-atomics.sh   |   19 -
 10 files changed, 35 insertions(+), 5134 deletions(-)
 delete mode 100644 include/asm-generic/atomic-instrumented.h
 delete mode 100644 include/asm-generic/atomic-long.h
 delete mode 100644 include/linux/atomic-fallback.h
 create mode 100644 scripts/Makefile.genheader
 delete mode 100755 scripts/atomic/check-atomics.sh

Andrew, Ingo, 

Could you please let me know whether you're happy with this?

This is based on the tip locking/core branch.

Locally, for a -j64 build this saves me ~2s for a pristine tree, and ~5s
for a rebuild:

  # before, pristine
  make -j64 -s  195.32s user 15.65s system 1670% cpu 12.631 total

  # before, rebuild
  make -j64 -s  2.55s user 1.64s system 64% cpu 6.468 total

  # after, pristine
  make -j64 -s  194.46s user 15.82s system 1965% cpu 10.701 total

  # after, rebuild
  make -j64 -s  2.34s user 1.02s system 179% cpu 1.865 total

Thanks,
Mark.

diff --git a/Kbuild b/Kbuild
index 780048056ac5..005304205482 100644
--- a/Kbuild
+++ b/Kbuild
@@ -6,8 +6,7 @@
 # 2) Generate timeconst.h
 # 3) Generate asm-offsets.h (may need bounds.h and timeconst.h)
 # 4) Check for missing system calls
-# 5) check atomics headers are up-to-date
-# 6) Generate constants.py (may need bounds.h)
+# 5) Generate constants.py (may need bounds.h)
 
 #
 # 1) Generate bounds.h
@@ -73,20 +72,7 @@ missing-syscalls: scripts/checksyscalls.sh $(offsets-file) 
FORCE
$(call cmd,syscalls)
 
 #
-# 5) Check atomic headers are up-to-date
-#
-
-always += old-atomics
-targets += old-atomics
-
-quiet_cmd_atomics = CALL$<
-  cmd_atomics = $(CONFIG_SHELL) $<
-
-old-atomics: scripts/atomic/check-atomics.sh FORCE
-   $(call cmd,atomics)
-
-#
-# 6) Generate constants for Python GDB integration
+# 5) Generate constants for Python GDB integration
 #
 
 extra-$(CONFIG_GDB_SCRIPTS) += build_constants_py
diff --git a/Makefile b/Makefile
index 6c40d547513c..fd871d2c2f54 100644
--- a/Makefile
+++ b/Makefile
@@ -224,7 +224,7 @@ clean-targets := %clean mrproper cleandocs
 no-dot-config-targets := $(clean-targets) \
 cscope gtags TAGS tags help% %docs check% coccicheck \
 $(version_h) headers_% archheaders archscripts \
-%asm-generic kernelversion %src-pkg
+%asm-generic genheader kernelversion %src-pkg
 no-sync-config-targets := $(no-dot-config-targets) install %install \
   kernelrelease
 
@@ -1089,7 +1089,7 @@ endif
 # prepare2 creates a makefile if using a separate output directory.
 # From this point forward, .config has been reprocessed, so any rules
 # that need to depend on updated CONFIG_* values can be checked here.
-prepare2: prepare3 outputmakefile asm-generic
+prepare2: prepare3 outputmakefile asm-generic genheader
 
 prepare1: prepare2 $(version_h) $(autoksyms_h) include/generated/utsrelease.h
$(cmd_crmodverdir)
@@ -1113,6 +1113,10 @@ uapi-asm-generic:
$(Q)$(MAKE) -f $(srctree)/scripts/Makefile.asm-generic \
src=uapi/asm obj=arch/$(SRCARCH)/include/generated/uapi/asm
 
+PHONY += genheader
+genheader:
+   $(Q)$(MAKE) -f $(srctree)/scripts/Makefile.genheader 
obj=include/generated
+
 PHONY += prepare-objtool
 prepare-objtool: $(objtool_target)
 
diff --git a/arch/arm64/include/asm/atomic.h b/arch/arm64/include/asm/at

Re: [RFC PATCH 2/3] mm, thp, proc: report THP eligibility for each vma

2018-11-23 Thread Vlastimil Babka
On 11/23/18 4:21 PM, Michal Hocko wrote:
> On Fri 23-11-18 16:07:06, Vlastimil Babka wrote:
>> On 11/20/18 11:35 AM, Michal Hocko wrote:
>>> From: Michal Hocko 
>>>
>>> Userspace falls short when trying to find out whether a specific memory
>>> range is eligible for THP. There are usecases that would like to know
>>> that
>>> http://lkml.kernel.org/r/alpine.deb.2.21.1809251248450.50...@chino.kir.corp.google.com
>>> : This is used to identify heap mappings that should be able to fault thp
>>> : but do not, and they normally point to a low-on-memory or fragmentation
>>> : issue.
>>>
>>> The only way to deduce this now is to query for hg resp. nh flags and
>>> confronting the state with the global setting. Except that there is
>>> also PR_SET_THP_DISABLE that might change the picture. So the final
>>> logic is not trivial. Moreover the eligibility of the vma depends on
>>> the type of VMA as well. In the past we have supported only anononymous
>>> memory VMAs but things have changed and shmem based vmas are supported
>>> as well these days and the query logic gets even more complicated
>>> because the eligibility depends on the mount option and another global
>>> configuration knob.
>>>
>>> Simplify the current state and report the THP eligibility in
>>> /proc//smaps for each existing vma. Reuse transparent_hugepage_enabled
>>> for this purpose. The original implementation of this function assumes
>>> that the caller knows that the vma itself is supported for THP so make
>>> the core checks into __transparent_hugepage_enabled and use it for
>>> existing callers. __show_smap just use the new transparent_hugepage_enabled
>>> which also checks the vma support status (please note that this one has
>>> to be out of line due to include dependency issues).
>>>
>>> Signed-off-by: Michal Hocko 
>>
>> Not thrilled by this,
> 
> Any specific concern?

The kitchen sink that smaps slowly becomes, with associated overhead
(i.e. one of reasons there's now smaps_rollup). Would be much nicer if
userspace had some way to say which fields it's interested in. But I
have no good ideas for that right now :/


Re: dw_mmc: IDMAC Invalidate cache after read

2018-11-23 Thread Robin Murphy

Hi Jan,

[repeating some of the discussion from your other thread for the benefit 
of the MMC audience]


On 21/11/2018 07:42, JABLONSKY Jan wrote:

CPU may not see most up-to-date and correct copy of DMA buffer, when
internal DMA controller is in use.
Problem appears on The Altera SoC FPGA (uses integrated DMA controller),
during higher CPU and system memory load

Signed-off-by: Jan Jablonsky 
---
  drivers/mmc/host/dw_mmc.c | 3 +--
  1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/mmc/host/dw_mmc.c b/drivers/mmc/host/dw_mmc.c
index 80dc2fd..63873d9 100644
--- a/drivers/mmc/host/dw_mmc.c
+++ b/drivers/mmc/host/dw_mmc.c
@@ -499,8 +499,7 @@ static void dw_mci_dmac_complete_dma(void *arg)
  
  	dev_vdbg(host->dev, "DMA complete\n");
  
-	if ((host->use_dma == TRANS_MODE_EDMAC) &&

-   data && (data->flags & MMC_DATA_READ))
+   if (data && (data->flags & MMC_DATA_READ))
/* Invalidate cache after read */
dma_sync_sg_for_cpu(mmc_dev(host->slot->mmc),
data->sg,


It looks very dubious whether this is actually the right thing to do. 
Just considering this driver, edma has an complementary sync_sg call in 
its .start method, so if idma needed this one, logically shouldn't it 
also need the other one as well?


However, from a DMA API point of view, these syncs make no sense either 
way - the very next thing we do here is call host->dma_ops->cleanup(), 
which calls dma_unmap_sg(), which will perform the appropriate cache 
maintenance anyway. Thus I can't see why this code is even here to begin 
with. Similarly on the request path - the sg list really shouldn't have 
been touched since being mapped in dw_mci_pre_dma_transfer(), so that 
sync should also be an effective no-op unless it's papering over some 
race condition elsewhere.


Shawn - do you remember why these syncs were added in 3fc7eaef44dbc? 
Were you seeing actual coherency issues on RK31xx SoCs, or was it 
perhaps just some leftover or misunderstanding which missed getting 
cleaned up?


Robin.


<    1   2   3   4   5   >