Re: [PATCH 2/2] ARM: Wrap '--pic-veneer' with ld-option

2018-12-05 Thread Nick Desaulniers
On Wed, Dec 5, 2018 at 12:10 AM Ard Biesheuvel
 wrote:
>
> (+ Arnd)
>
> On Wed, 5 Dec 2018 at 09:06, Nathan Chancellor  
> wrote:
> >
> > On Wed, Dec 05, 2018 at 08:37:05AM +0100, Ard Biesheuvel wrote:
> > > On Wed, 5 Dec 2018 at 02:42, Nathan Chancellor  
> > > wrote:
> > > >
> > > > This flag is not supported by lld:
> > > >
> > > > ld.lld: error: unknown argument: --pic-veneer
> > > >
> > > > Signed-off-by: Nathan Chancellor 
> > >
> > > Hi Nate,
> > >
> > > Does this mean ld.lld is guaranteed to produce position independent
> > > veneers if you build kernels that are bigger than the typical range of
> > > a relative branch?

(+ Peter) who might be able to answer this.

> > >
> >
> > Hi Ard,
> >
> > Honestly, I'm not quite sure. I saw your commit that introduced this
> > flag and I wasn't quite sure what to make of it for lld. What
> > configuration would I use to verify and what would I check for?
> >
>
> Try building allyesconfig, and check the resulting binary for veneers
> (which have 'veneer' in the symbol name, at least when ld.bfd emits
> them). These veneers should not take the [virtual] address of the
> branch target directly, but take a PC relative offset (as in the
> example in the commit log of that patch you are referring to)

Not sure clang can compile an arm32 allyesconfig (haven't tried
personally, yet, TODO), but we could try gcc+lld.

>
> > Additionally, I have filed an LLVM bug for the lld developers to
> > check and see if this is a flag they should support:
> >
> > https://bugs.llvm.org/show_bug.cgi?id=39886
> >
> > Thanks for the quick reply,
> > Nathan
> >
> > > > ---
> > > >  arch/arm/Makefile | 2 +-
> > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > >
> > > > diff --git a/arch/arm/Makefile b/arch/arm/Makefile
> > > > index e2a0baf36766..4fab2aa29570 100644
> > > > --- a/arch/arm/Makefile
> > > > +++ b/arch/arm/Makefile
> > > > @@ -10,7 +10,7 @@
> > > >  #
> > > >  # Copyright (C) 1995-2001 by Russell King
> > > >
> > > > -LDFLAGS_vmlinux:= --no-undefined -X --pic-veneer
> > > > +LDFLAGS_vmlinux:= --no-undefined -X $(call 
> > > > ld-option,--pic-veneer)
> > > >  ifeq ($(CONFIG_CPU_ENDIAN_BE8),y)
> > > >  LDFLAGS_vmlinux+= --be8
> > > >  KBUILD_LDFLAGS_MODULE  += --be8
> > > > --
> > > > 2.20.0.rc1
> > > >



-- 
Thanks,
~Nick Desaulniers


Re: [PATCH 2/2] ARM: Wrap '--pic-veneer' with ld-option

2018-12-05 Thread Nick Desaulniers
On Wed, Dec 5, 2018 at 12:10 AM Ard Biesheuvel
 wrote:
>
> (+ Arnd)
>
> On Wed, 5 Dec 2018 at 09:06, Nathan Chancellor  
> wrote:
> >
> > On Wed, Dec 05, 2018 at 08:37:05AM +0100, Ard Biesheuvel wrote:
> > > On Wed, 5 Dec 2018 at 02:42, Nathan Chancellor  
> > > wrote:
> > > >
> > > > This flag is not supported by lld:
> > > >
> > > > ld.lld: error: unknown argument: --pic-veneer
> > > >
> > > > Signed-off-by: Nathan Chancellor 
> > >
> > > Hi Nate,
> > >
> > > Does this mean ld.lld is guaranteed to produce position independent
> > > veneers if you build kernels that are bigger than the typical range of
> > > a relative branch?

(+ Peter) who might be able to answer this.

> > >
> >
> > Hi Ard,
> >
> > Honestly, I'm not quite sure. I saw your commit that introduced this
> > flag and I wasn't quite sure what to make of it for lld. What
> > configuration would I use to verify and what would I check for?
> >
>
> Try building allyesconfig, and check the resulting binary for veneers
> (which have 'veneer' in the symbol name, at least when ld.bfd emits
> them). These veneers should not take the [virtual] address of the
> branch target directly, but take a PC relative offset (as in the
> example in the commit log of that patch you are referring to)

Not sure clang can compile an arm32 allyesconfig (haven't tried
personally, yet, TODO), but we could try gcc+lld.

>
> > Additionally, I have filed an LLVM bug for the lld developers to
> > check and see if this is a flag they should support:
> >
> > https://bugs.llvm.org/show_bug.cgi?id=39886
> >
> > Thanks for the quick reply,
> > Nathan
> >
> > > > ---
> > > >  arch/arm/Makefile | 2 +-
> > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > >
> > > > diff --git a/arch/arm/Makefile b/arch/arm/Makefile
> > > > index e2a0baf36766..4fab2aa29570 100644
> > > > --- a/arch/arm/Makefile
> > > > +++ b/arch/arm/Makefile
> > > > @@ -10,7 +10,7 @@
> > > >  #
> > > >  # Copyright (C) 1995-2001 by Russell King
> > > >
> > > > -LDFLAGS_vmlinux:= --no-undefined -X --pic-veneer
> > > > +LDFLAGS_vmlinux:= --no-undefined -X $(call 
> > > > ld-option,--pic-veneer)
> > > >  ifeq ($(CONFIG_CPU_ENDIAN_BE8),y)
> > > >  LDFLAGS_vmlinux+= --be8
> > > >  KBUILD_LDFLAGS_MODULE  += --be8
> > > > --
> > > > 2.20.0.rc1
> > > >



-- 
Thanks,
~Nick Desaulniers


Re: [PATCH v3] signal: add procfd_send_signal() syscall

2018-12-05 Thread Eric W. Biederman
Christian Brauner  writes:

> The kill() syscall operates on process identifiers (pid). After a process
> has exited its pid can be reused by another process. If a caller sends a
> signal to a reused pid it will end up signaling the wrong process. This
> issue has often surfaced and there has been a push [1] to address this
> problem.
>
> This patch uses file descriptors (fd) from proc/ as stable handles on
> struct pid. Even if a pid is recycled the handle will not change. The fd
> can be used to send signals to the process it refers to.
> Thus, the new syscall procfd_send_signal() is introduced to solve this
> problem. Instead of pids it operates on process fds (procfd).
>
> /* prototype and argument /*
> long procfd_send_signal(int fd, int sig, siginfo_t *info, unsigned int flags);
>
> In addition to the procfd and signal argument it takes an additional
> siginfo_t and flags argument. If the siginfo_t argument is NULL then
> procfd_send_signal() behaves like kill(). If it is not NULL
> procfd_send_signal() behaves like rt_sigqueueinfo().
> The flags argument is added to allow for future extensions of this syscall.
> It currently needs to be passed as 0. Failing to do so will cause EINVAL.
>
> /* procfd_send_signal() replaces multiple pid-based syscalls */
> The procfd_send_signal() syscall currently takes on the job of the
> following syscalls that operate on pids:
> - kill(2)
> - rt_sigqueueinfo(2)
> The syscall is defined in such a way that it can also operate on thread fds
> instead of process fds. In a future patchset I will extend it to operate on
> procfds from /proc//task/ at which point it will additionally
> take on the job of:
> - tgkill(2)
> - rt_tgsigqueueinfo(2)
> Right now this is intentionally left out to keep this patchset as simple as
> possible (cf. [4]). If a procfd of /proc//task/ is passed
> EOPNOTSUPP will be returned to give userspace a way to detect when I add
> support for such procfds {cf. [10]).
>
> /* naming */
> The name procfd_send_signal() was chosen to reflect the fact that it takes
> on the job of multiple syscalls. It is intentional that the name is not
> reminiscent of neither kill(2) nor rt_sigqueueinfo(2). Not the fomer
> because it might imply that procfd_send_signal() is only a replacement for
> kill(2) and not the latter because it is a hazzle to remember the correct
> spelling (especially for non-native speakers) and because it is not
> descriptive enough of what the syscall actually does. The name
> "procfd_send_signal" makes it very clear that its job is to send signals.
>
> /* O_PATH file descriptors */
> procfds opened as O_PATH fds cannot be used to send signals to a process
> (cf. [2]). Signaling processes through procfds is the equivalent of writing
> to a file. Thus, this is not an operation that operates "purely at the
> file descriptor level" as required by the open(2) manpage.
>
> /* zombies */
> Zombies can be signaled just as any other process. No special error will be
> reported since a zombie state is an unreliable state (cf. [3]).
>
> /* cross-namespace signals */
> The patch currently enforces that the signaler and signalee either are in
> the same pid namespace or that the signaler's pid namespace is an ancestor
> of the signalee's pid namespace. This is done for the sake of simplicity
> and because it is unclear to what values certain members of struct
> siginfo_t would need to be set to (cf. [5], [6]).
>
> /* compat syscalls */
> It became clear that we would like to avoid adding compat syscalls (cf.
> [7]). The compat syscall handling is now done in kernel/signal.c itself by
> adding __copy_siginfo_from_user_generic() which lets us avoid compat
> syscalls (cf. [8]).
> With upcoming rework for syscall handling things might improve even further
> (cf. [11]).
> This patch was tested on x64 and x86.
>
> /* userspace usage */
> An asciinema recording for the basic functionality can be found under [9].
> With this patch a process can be killed via:
>
>  #define _GNU_SOURCE
>  #include 
>  #include 
>  #include 
>  #include 
>  #include 
>  #include 
>  #include 
>  #include 
>  #include 
>  #include 
>
>  static inline int do_procfd_send_signal(int procfd, int sig, siginfo_t *info,
>  unsigned int flags)
>  {
>  #ifdef __NR_procfd_send_signal
>  return syscall(__NR_procfd_send_signal, procfd, sig, info, flags);
>  #else
>  return -ENOSYS;
>  #endif
>  }
>
>  int main(int argc, char *argv[])
>  {
>  int fd, ret, saved_errno, sig;
>
>  if (argc < 3)
>  exit(EXIT_FAILURE);
>
>  fd = open(argv[1], O_DIRECTORY | O_CLOEXEC);
>  if (fd < 0) {
>  printf("%s - Failed to open \"%s\"\n", strerror(errno), 
> argv[1]);
>  exit(EXIT_FAILURE);
>  }
>
>  sig = atoi(argv[2]);
>
>  printf("Sending signal %d to process %s\n", sig, argv[1]);
>  ret = do_procfd_send_signal(fd, sig, NULL, 0);
>
> 

Re: [PATCH v3] signal: add procfd_send_signal() syscall

2018-12-05 Thread Eric W. Biederman
Christian Brauner  writes:

> The kill() syscall operates on process identifiers (pid). After a process
> has exited its pid can be reused by another process. If a caller sends a
> signal to a reused pid it will end up signaling the wrong process. This
> issue has often surfaced and there has been a push [1] to address this
> problem.
>
> This patch uses file descriptors (fd) from proc/ as stable handles on
> struct pid. Even if a pid is recycled the handle will not change. The fd
> can be used to send signals to the process it refers to.
> Thus, the new syscall procfd_send_signal() is introduced to solve this
> problem. Instead of pids it operates on process fds (procfd).
>
> /* prototype and argument /*
> long procfd_send_signal(int fd, int sig, siginfo_t *info, unsigned int flags);
>
> In addition to the procfd and signal argument it takes an additional
> siginfo_t and flags argument. If the siginfo_t argument is NULL then
> procfd_send_signal() behaves like kill(). If it is not NULL
> procfd_send_signal() behaves like rt_sigqueueinfo().
> The flags argument is added to allow for future extensions of this syscall.
> It currently needs to be passed as 0. Failing to do so will cause EINVAL.
>
> /* procfd_send_signal() replaces multiple pid-based syscalls */
> The procfd_send_signal() syscall currently takes on the job of the
> following syscalls that operate on pids:
> - kill(2)
> - rt_sigqueueinfo(2)
> The syscall is defined in such a way that it can also operate on thread fds
> instead of process fds. In a future patchset I will extend it to operate on
> procfds from /proc//task/ at which point it will additionally
> take on the job of:
> - tgkill(2)
> - rt_tgsigqueueinfo(2)
> Right now this is intentionally left out to keep this patchset as simple as
> possible (cf. [4]). If a procfd of /proc//task/ is passed
> EOPNOTSUPP will be returned to give userspace a way to detect when I add
> support for such procfds {cf. [10]).
>
> /* naming */
> The name procfd_send_signal() was chosen to reflect the fact that it takes
> on the job of multiple syscalls. It is intentional that the name is not
> reminiscent of neither kill(2) nor rt_sigqueueinfo(2). Not the fomer
> because it might imply that procfd_send_signal() is only a replacement for
> kill(2) and not the latter because it is a hazzle to remember the correct
> spelling (especially for non-native speakers) and because it is not
> descriptive enough of what the syscall actually does. The name
> "procfd_send_signal" makes it very clear that its job is to send signals.
>
> /* O_PATH file descriptors */
> procfds opened as O_PATH fds cannot be used to send signals to a process
> (cf. [2]). Signaling processes through procfds is the equivalent of writing
> to a file. Thus, this is not an operation that operates "purely at the
> file descriptor level" as required by the open(2) manpage.
>
> /* zombies */
> Zombies can be signaled just as any other process. No special error will be
> reported since a zombie state is an unreliable state (cf. [3]).
>
> /* cross-namespace signals */
> The patch currently enforces that the signaler and signalee either are in
> the same pid namespace or that the signaler's pid namespace is an ancestor
> of the signalee's pid namespace. This is done for the sake of simplicity
> and because it is unclear to what values certain members of struct
> siginfo_t would need to be set to (cf. [5], [6]).
>
> /* compat syscalls */
> It became clear that we would like to avoid adding compat syscalls (cf.
> [7]). The compat syscall handling is now done in kernel/signal.c itself by
> adding __copy_siginfo_from_user_generic() which lets us avoid compat
> syscalls (cf. [8]).
> With upcoming rework for syscall handling things might improve even further
> (cf. [11]).
> This patch was tested on x64 and x86.
>
> /* userspace usage */
> An asciinema recording for the basic functionality can be found under [9].
> With this patch a process can be killed via:
>
>  #define _GNU_SOURCE
>  #include 
>  #include 
>  #include 
>  #include 
>  #include 
>  #include 
>  #include 
>  #include 
>  #include 
>  #include 
>
>  static inline int do_procfd_send_signal(int procfd, int sig, siginfo_t *info,
>  unsigned int flags)
>  {
>  #ifdef __NR_procfd_send_signal
>  return syscall(__NR_procfd_send_signal, procfd, sig, info, flags);
>  #else
>  return -ENOSYS;
>  #endif
>  }
>
>  int main(int argc, char *argv[])
>  {
>  int fd, ret, saved_errno, sig;
>
>  if (argc < 3)
>  exit(EXIT_FAILURE);
>
>  fd = open(argv[1], O_DIRECTORY | O_CLOEXEC);
>  if (fd < 0) {
>  printf("%s - Failed to open \"%s\"\n", strerror(errno), 
> argv[1]);
>  exit(EXIT_FAILURE);
>  }
>
>  sig = atoi(argv[2]);
>
>  printf("Sending signal %d to process %s\n", sig, argv[1]);
>  ret = do_procfd_send_signal(fd, sig, NULL, 0);
>
> 

Re: [GIT PULL] MFD fixes for v4.20

2018-12-05 Thread pr-tracker-bot
The pull request you sent on Wed, 5 Dec 2018 10:08:56 +:

> git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd.git mfd-fixes-4.20

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/5e0dc1a7ce1f322c06e993845def2c73cfc485fd

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.wiki.kernel.org/userdoc/prtracker


Re: [GIT PULL] MFD fixes for v4.20

2018-12-05 Thread pr-tracker-bot
The pull request you sent on Wed, 5 Dec 2018 10:08:56 +:

> git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd.git mfd-fixes-4.20

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/5e0dc1a7ce1f322c06e993845def2c73cfc485fd

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.wiki.kernel.org/userdoc/prtracker


Re: [GIT PULL] parisc architecture fix for kernel v4.20-rc6

2018-12-05 Thread pr-tracker-bot
The pull request you sent on Tue, 4 Dec 2018 16:00:32 +0100:

> git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux.git 
> parisc-4.20-4

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/a517af52cb57f972cee2378765e1c5dd10141f0d

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.wiki.kernel.org/userdoc/prtracker


Re: [RFC PATCH 02/14] mm/hms: heterogenenous memory system (HMS) documentation

2018-12-05 Thread Logan Gunthorpe



On 2018-12-05 11:07 a.m., Jerome Glisse wrote:
>> Well multiple links are easy when you have a 'link' bus. Just add
>> another link device under the bus.
> 
> So you are telling do what i am doing in this patchset but not under
> HMS directory ?

No, it's completely different. I'm talking about creating a bus to
describe only the real hardware that links GPUs. Not creating a new
virtual tree containing a bunch of duplicate bus and device information
that already exists currently in sysfs.

>>
>> Technically, the accessibility issue is already encoded in sysfs. For
>> example, through the PCI tree you can determine which ACS bits are set
>> and determine which devices are behind the same root bridge the same way
>> we do in the kernel p2pdma subsystem. This is all bus specific which is
>> fine, but if we want to change that, we should have a common way for
>> existing buses to describe these attributes in the existing tree. The
>> new 'link' bus devices would have to have some way to describe cases if
>> memory isn't accessible in some way across it.
> 
> What i am looking at is much more complex than just access bit. It
> is a whole set of properties attach to each path (can it be cache
> coherent ? can it do atomic ? what is the access granularity ? what
> is the bandwidth ? is it dedicated link ? ...)

I'm not talking about just an access bit. I'm talking about what you are
describing: standard ways for *existing* buses in the sysfs hierarchy to
describe things like cache coherency, atomics, granularity, etc without
creating a new hierarchy.

> On top of that i argue that more people would use that information if it
> were available to them. I agree that i have no hard evidence to back that
> up and that it is just a feeling but you can not disprove me either as
> this is a chicken and egg problem, you can not prove people will not use
> an API if the API is not there to be use.

And you miss my point that much of this information is already available
to them. And more can be added in the existing framework without
creating any brand new concepts. I haven't said anything about
chicken-and-egg problems -- I've given you a bunch of different
suggestions to split this up into more managable problems and address
many of them within the APIs and frameworks we have already.

Logan


Re: [RFC PATCH 02/14] mm/hms: heterogenenous memory system (HMS) documentation

2018-12-05 Thread Logan Gunthorpe



On 2018-12-05 11:07 a.m., Jerome Glisse wrote:
>> Well multiple links are easy when you have a 'link' bus. Just add
>> another link device under the bus.
> 
> So you are telling do what i am doing in this patchset but not under
> HMS directory ?

No, it's completely different. I'm talking about creating a bus to
describe only the real hardware that links GPUs. Not creating a new
virtual tree containing a bunch of duplicate bus and device information
that already exists currently in sysfs.

>>
>> Technically, the accessibility issue is already encoded in sysfs. For
>> example, through the PCI tree you can determine which ACS bits are set
>> and determine which devices are behind the same root bridge the same way
>> we do in the kernel p2pdma subsystem. This is all bus specific which is
>> fine, but if we want to change that, we should have a common way for
>> existing buses to describe these attributes in the existing tree. The
>> new 'link' bus devices would have to have some way to describe cases if
>> memory isn't accessible in some way across it.
> 
> What i am looking at is much more complex than just access bit. It
> is a whole set of properties attach to each path (can it be cache
> coherent ? can it do atomic ? what is the access granularity ? what
> is the bandwidth ? is it dedicated link ? ...)

I'm not talking about just an access bit. I'm talking about what you are
describing: standard ways for *existing* buses in the sysfs hierarchy to
describe things like cache coherency, atomics, granularity, etc without
creating a new hierarchy.

> On top of that i argue that more people would use that information if it
> were available to them. I agree that i have no hard evidence to back that
> up and that it is just a feeling but you can not disprove me either as
> this is a chicken and egg problem, you can not prove people will not use
> an API if the API is not there to be use.

And you miss my point that much of this information is already available
to them. And more can be added in the existing framework without
creating any brand new concepts. I haven't said anything about
chicken-and-egg problems -- I've given you a bunch of different
suggestions to split this up into more managable problems and address
many of them within the APIs and frameworks we have already.

Logan


Re: [GIT PULL] parisc architecture fix for kernel v4.20-rc6

2018-12-05 Thread pr-tracker-bot
The pull request you sent on Tue, 4 Dec 2018 16:00:32 +0100:

> git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux.git 
> parisc-4.20-4

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/a517af52cb57f972cee2378765e1c5dd10141f0d

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.wiki.kernel.org/userdoc/prtracker


Re: [GIT PULL] Power management fix for v4.20-rc6

2018-12-05 Thread pr-tracker-bot
The pull request you sent on Wed, 5 Dec 2018 12:53:50 +0100:

> git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git pm-4.20-rc6

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/91dd51f6e7ce7f1eba5d7b583790e134367d0fbb

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.wiki.kernel.org/userdoc/prtracker


Re: [GIT PULL] Power management fix for v4.20-rc6

2018-12-05 Thread pr-tracker-bot
The pull request you sent on Wed, 5 Dec 2018 12:53:50 +0100:

> git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git pm-4.20-rc6

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/91dd51f6e7ce7f1eba5d7b583790e134367d0fbb

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.wiki.kernel.org/userdoc/prtracker


Re: [git pull] Input updates for v4.20-rc5

2018-12-05 Thread pr-tracker-bot
The pull request you sent on Mon, 3 Dec 2018 19:37:46 -0800:

> git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input.git for-linus

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/adac0753c25217a2365b132c87cb2540b51fa89b

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.wiki.kernel.org/userdoc/prtracker


Re: [git pull] Input updates for v4.20-rc5

2018-12-05 Thread pr-tracker-bot
The pull request you sent on Mon, 3 Dec 2018 19:37:46 -0800:

> git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input.git for-linus

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/adac0753c25217a2365b132c87cb2540b51fa89b

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.wiki.kernel.org/userdoc/prtracker


Re: [PATCH v3] signal: add procfd_send_signal() syscall

2018-12-05 Thread Eric W. Biederman
Christian Brauner  writes:

> The kill() syscall operates on process identifiers (pid). After a process
> has exited its pid can be reused by another process. If a caller sends a
> signal to a reused pid it will end up signaling the wrong process. This
> issue has often surfaced and there has been a push [1] to address this
> problem.
>
> This patch uses file descriptors (fd) from proc/ as stable handles on
> struct pid. Even if a pid is recycled the handle will not change. The fd
> can be used to send signals to the process it refers to.
> Thus, the new syscall procfd_send_signal() is introduced to solve this
> problem. Instead of pids it operates on process fds (procfd).
>
> /* prototype and argument /*
> long procfd_send_signal(int fd, int sig, siginfo_t *info, unsigned int flags);
>
> In addition to the procfd and signal argument it takes an additional
> siginfo_t and flags argument. If the siginfo_t argument is NULL then
> procfd_send_signal() behaves like kill(). If it is not NULL
> procfd_send_signal() behaves like rt_sigqueueinfo().
> The flags argument is added to allow for future extensions of this syscall.
> It currently needs to be passed as 0. Failing to do so will cause EINVAL.
>
> /* procfd_send_signal() replaces multiple pid-based syscalls */
> The procfd_send_signal() syscall currently takes on the job of the
> following syscalls that operate on pids:
> - kill(2)
> - rt_sigqueueinfo(2)
> The syscall is defined in such a way that it can also operate on thread fds
> instead of process fds. In a future patchset I will extend it to operate on
> procfds from /proc//task/ at which point it will additionally
> take on the job of:
> - tgkill(2)
> - rt_tgsigqueueinfo(2)
> Right now this is intentionally left out to keep this patchset as simple as
> possible (cf. [4]). If a procfd of /proc//task/ is passed
> EOPNOTSUPP will be returned to give userspace a way to detect when I add
> support for such procfds {cf. [10]).
>
> /* naming */
> The name procfd_send_signal() was chosen to reflect the fact that it takes
> on the job of multiple syscalls. It is intentional that the name is not
> reminiscent of neither kill(2) nor rt_sigqueueinfo(2). Not the fomer
> because it might imply that procfd_send_signal() is only a replacement for
> kill(2) and not the latter because it is a hazzle to remember the correct
> spelling (especially for non-native speakers) and because it is not
> descriptive enough of what the syscall actually does. The name
> "procfd_send_signal" makes it very clear that its job is to send signals.
>
> /* O_PATH file descriptors */
> procfds opened as O_PATH fds cannot be used to send signals to a process
> (cf. [2]). Signaling processes through procfds is the equivalent of writing
> to a file. Thus, this is not an operation that operates "purely at the
> file descriptor level" as required by the open(2) manpage.
>
> /* zombies */
> Zombies can be signaled just as any other process. No special error will be
> reported since a zombie state is an unreliable state (cf. [3]).
>
> /* cross-namespace signals */
> The patch currently enforces that the signaler and signalee either are in
> the same pid namespace or that the signaler's pid namespace is an ancestor
> of the signalee's pid namespace. This is done for the sake of simplicity
> and because it is unclear to what values certain members of struct
> siginfo_t would need to be set to (cf. [5], [6]).
>
> /* compat syscalls */
> It became clear that we would like to avoid adding compat syscalls (cf.
> [7]). The compat syscall handling is now done in kernel/signal.c itself by
> adding __copy_siginfo_from_user_generic() which lets us avoid compat
> syscalls (cf. [8]).
> With upcoming rework for syscall handling things might improve even further
> (cf. [11]).
> This patch was tested on x64 and x86.
>
> /* userspace usage */
> An asciinema recording for the basic functionality can be found under [9].
> With this patch a process can be killed via:
>
>  #define _GNU_SOURCE
>  #include 
>  #include 
>  #include 
>  #include 
>  #include 
>  #include 
>  #include 
>  #include 
>  #include 
>  #include 
>
>  static inline int do_procfd_send_signal(int procfd, int sig, siginfo_t *info,
>  unsigned int flags)
>  {
>  #ifdef __NR_procfd_send_signal
>  return syscall(__NR_procfd_send_signal, procfd, sig, info, flags);
>  #else
>  return -ENOSYS;
>  #endif
>  }
>
>  int main(int argc, char *argv[])
>  {
>  int fd, ret, saved_errno, sig;
>
>  if (argc < 3)
>  exit(EXIT_FAILURE);
>
>  fd = open(argv[1], O_DIRECTORY | O_CLOEXEC);
>  if (fd < 0) {
>  printf("%s - Failed to open \"%s\"\n", strerror(errno), 
> argv[1]);
>  exit(EXIT_FAILURE);
>  }
>
>  sig = atoi(argv[2]);
>
>  printf("Sending signal %d to process %s\n", sig, argv[1]);
>  ret = do_procfd_send_signal(fd, sig, NULL, 0);
>
> 

Re: [PATCH v3] signal: add procfd_send_signal() syscall

2018-12-05 Thread Eric W. Biederman
Christian Brauner  writes:

> The kill() syscall operates on process identifiers (pid). After a process
> has exited its pid can be reused by another process. If a caller sends a
> signal to a reused pid it will end up signaling the wrong process. This
> issue has often surfaced and there has been a push [1] to address this
> problem.
>
> This patch uses file descriptors (fd) from proc/ as stable handles on
> struct pid. Even if a pid is recycled the handle will not change. The fd
> can be used to send signals to the process it refers to.
> Thus, the new syscall procfd_send_signal() is introduced to solve this
> problem. Instead of pids it operates on process fds (procfd).
>
> /* prototype and argument /*
> long procfd_send_signal(int fd, int sig, siginfo_t *info, unsigned int flags);
>
> In addition to the procfd and signal argument it takes an additional
> siginfo_t and flags argument. If the siginfo_t argument is NULL then
> procfd_send_signal() behaves like kill(). If it is not NULL
> procfd_send_signal() behaves like rt_sigqueueinfo().
> The flags argument is added to allow for future extensions of this syscall.
> It currently needs to be passed as 0. Failing to do so will cause EINVAL.
>
> /* procfd_send_signal() replaces multiple pid-based syscalls */
> The procfd_send_signal() syscall currently takes on the job of the
> following syscalls that operate on pids:
> - kill(2)
> - rt_sigqueueinfo(2)
> The syscall is defined in such a way that it can also operate on thread fds
> instead of process fds. In a future patchset I will extend it to operate on
> procfds from /proc//task/ at which point it will additionally
> take on the job of:
> - tgkill(2)
> - rt_tgsigqueueinfo(2)
> Right now this is intentionally left out to keep this patchset as simple as
> possible (cf. [4]). If a procfd of /proc//task/ is passed
> EOPNOTSUPP will be returned to give userspace a way to detect when I add
> support for such procfds {cf. [10]).
>
> /* naming */
> The name procfd_send_signal() was chosen to reflect the fact that it takes
> on the job of multiple syscalls. It is intentional that the name is not
> reminiscent of neither kill(2) nor rt_sigqueueinfo(2). Not the fomer
> because it might imply that procfd_send_signal() is only a replacement for
> kill(2) and not the latter because it is a hazzle to remember the correct
> spelling (especially for non-native speakers) and because it is not
> descriptive enough of what the syscall actually does. The name
> "procfd_send_signal" makes it very clear that its job is to send signals.
>
> /* O_PATH file descriptors */
> procfds opened as O_PATH fds cannot be used to send signals to a process
> (cf. [2]). Signaling processes through procfds is the equivalent of writing
> to a file. Thus, this is not an operation that operates "purely at the
> file descriptor level" as required by the open(2) manpage.
>
> /* zombies */
> Zombies can be signaled just as any other process. No special error will be
> reported since a zombie state is an unreliable state (cf. [3]).
>
> /* cross-namespace signals */
> The patch currently enforces that the signaler and signalee either are in
> the same pid namespace or that the signaler's pid namespace is an ancestor
> of the signalee's pid namespace. This is done for the sake of simplicity
> and because it is unclear to what values certain members of struct
> siginfo_t would need to be set to (cf. [5], [6]).
>
> /* compat syscalls */
> It became clear that we would like to avoid adding compat syscalls (cf.
> [7]). The compat syscall handling is now done in kernel/signal.c itself by
> adding __copy_siginfo_from_user_generic() which lets us avoid compat
> syscalls (cf. [8]).
> With upcoming rework for syscall handling things might improve even further
> (cf. [11]).
> This patch was tested on x64 and x86.
>
> /* userspace usage */
> An asciinema recording for the basic functionality can be found under [9].
> With this patch a process can be killed via:
>
>  #define _GNU_SOURCE
>  #include 
>  #include 
>  #include 
>  #include 
>  #include 
>  #include 
>  #include 
>  #include 
>  #include 
>  #include 
>
>  static inline int do_procfd_send_signal(int procfd, int sig, siginfo_t *info,
>  unsigned int flags)
>  {
>  #ifdef __NR_procfd_send_signal
>  return syscall(__NR_procfd_send_signal, procfd, sig, info, flags);
>  #else
>  return -ENOSYS;
>  #endif
>  }
>
>  int main(int argc, char *argv[])
>  {
>  int fd, ret, saved_errno, sig;
>
>  if (argc < 3)
>  exit(EXIT_FAILURE);
>
>  fd = open(argv[1], O_DIRECTORY | O_CLOEXEC);
>  if (fd < 0) {
>  printf("%s - Failed to open \"%s\"\n", strerror(errno), 
> argv[1]);
>  exit(EXIT_FAILURE);
>  }
>
>  sig = atoi(argv[2]);
>
>  printf("Sending signal %d to process %s\n", sig, argv[1]);
>  ret = do_procfd_send_signal(fd, sig, NULL, 0);
>
> 

Re: [PATCH 2/5] ARC: perf: introduce Kernel PMU events support

2018-12-05 Thread Vineet Gupta
On 12/5/18 9:06 AM, Eugeniy Paltsev wrote:
> Export all available ARC architected hardware events as
> kernel PMU events to make non-generic events accessible.
>
> ARC PMU HW allow us to read the list of all available
> events names. So we generate kernel PMU event list
> dynamically in arc_pmu_device_probe() using
> human-readable events names we got from HW instead of
> using pre-defined events list.
>
> -->8--
> $ perf list
>   [snip]
>   arc_pmu/bdata64/  [Kernel PMU event]
>   arc_pmu/bdcstall/ [Kernel PMU event]
>   arc_pmu/bdslot/   [Kernel PMU event]
>   arc_pmu/bfbmp/[Kernel PMU event]
>   arc_pmu/bfirqex/  [Kernel PMU event]
>   arc_pmu/bflgstal/ [Kernel PMU event]
>   arc_pmu/bflush/   [Kernel PMU event]

Lets call this pct iso pmu since pmu has more of power mgmt connotation. I know
the code has pmu littered all over but atleast the user interface could be more
intuitive.

BTW this approach seems more user friendly and is different from Alexey's 
earlier
stab at implementing raw events [1]. He didn't keep around any list (and relied 
on
use rlooking in the PRM to find his interesting event of the day) and pass as
*r*foobar.  PeterZ at the time had some reservations with that which I never 
fully
understood.

[1] https://lore.kernel.org/patchwork/patch/568769/

> -->8--
>
> Signed-off-by: Eugeniy Paltsev 
> ---
>  arch/arc/kernel/perf_event.c | 107 
> ++-
>  1 file changed, 106 insertions(+), 1 deletion(-)
>
> diff --git a/arch/arc/kernel/perf_event.c b/arch/arc/kernel/perf_event.c
> index 811a07a2ca21..97b88b00c418 100644
> --- a/arch/arc/kernel/perf_event.c
> +++ b/arch/arc/kernel/perf_event.c
> @@ -22,12 +22,28 @@
>  /* HW holds 8 symbols + one for null terminator */
>  #define ARCPMU_EVENT_NAME_LEN9
>  
> +enum arc_pmu_attr_groups {
> + ARCPMU_ATTR_GR_EVENTS,
> + ARCPMU_ATTR_GR_FORMATS,
> + ARCPMU_NR_ATTR_GR
> +};
> +
> +struct arc_pmu_raw_event_entry {
> + char name[ARCPMU_EVENT_NAME_LEN];
> +};
> +
>  struct arc_pmu {
>   struct pmu  pmu;
>   unsigned intirq;
>   int n_counters;
> + int n_events;
>   u64 max_period;
>   int ev_hw_idx[PERF_COUNT_ARC_HW_MAX];
> +
> + struct arc_pmu_raw_event_entry  *raw_entry;
> + struct attribute**attrs;
> + struct perf_pmu_events_attr *attr;
> + const struct attribute_group*attr_groups[ARCPMU_NR_ATTR_GR + 1];
>  };
>  
>  struct arc_pmu_cpu {
> @@ -196,6 +212,17 @@ static int arc_pmu_event_init(struct perf_event *event)
>(int)hwc->config, arc_pmu_ev_hw_map[ret]);
>   return 0;
>  
> + case PERF_TYPE_RAW:
> + if (event->attr.config >= arc_pmu->n_events)
> + return -ENOENT;
> +
> + hwc->config |= event->attr.config;
> + pr_debug("init raw event with idx %lld \'%s\'\n",
> +  event->attr.config,
> +  arc_pmu->raw_entry[event->attr.config].name);
> +
> + return 0;
> +
>   default:
>   return -ENOENT;
>   }
> @@ -446,6 +473,68 @@ static void arc_cpu_pmu_irq_init(void *data)
>   write_aux_reg(ARC_REG_PCT_INT_ACT, 0x);
>  }
>  
> +/* Event field occupies the bottom 15 bits of our config field */
> +PMU_FORMAT_ATTR(event, "config:0-14");
> +static struct attribute *arc_pmu_format_attrs[] = {
> + _attr_event.attr,
> + NULL,
> +};
> +
> +static struct attribute_group arc_pmu_format_attr_gr = {
> + .name = "format",
> + .attrs = arc_pmu_format_attrs,
> +};
> +
> +static ssize_t
> +arc_pmu_events_sysfs_show(struct device *dev,
> +   struct device_attribute *attr, char *page)
> +{
> + struct perf_pmu_events_attr *pmu_attr;
> +
> + pmu_attr = container_of(attr, struct perf_pmu_events_attr, attr);
> + return sprintf(page, "event=0x%04llx\n", pmu_attr->id);
> +}
> +
> +/*
> + * We don't add attrs here as we don't have pre-defined list of perf events.
> + * We will generate and add attrs dynamically in probe() after we read HW
> + * configuration.
> + */
> +static struct attribute_group arc_pmu_events_attr_gr = {
> + .name = "events",
> +};
> +
> +static void arc_pmu_add_raw_event_attr(int j, char *str)
> +{
> + memmove(arc_pmu->raw_entry[j].name, str, ARCPMU_EVENT_NAME_LEN - 1);
> + arc_pmu->attr[j].attr.attr.name = arc_pmu->raw_entry[j].name;
> + arc_pmu->attr[j].attr.attr.mode = VERIFY_OCTAL_PERMISSIONS(0444);
> + arc_pmu->attr[j].attr.show = arc_pmu_events_sysfs_show;
> + arc_pmu->attr[j].id = j;
> + arc_pmu->attrs[j] = &(arc_pmu->attr[j].attr.attr);
> +}
> +
> +static int arc_pmu_raw_alloc(struct device *dev)
> +{
> + 

Re: [PATCH 2/5] ARC: perf: introduce Kernel PMU events support

2018-12-05 Thread Vineet Gupta
On 12/5/18 9:06 AM, Eugeniy Paltsev wrote:
> Export all available ARC architected hardware events as
> kernel PMU events to make non-generic events accessible.
>
> ARC PMU HW allow us to read the list of all available
> events names. So we generate kernel PMU event list
> dynamically in arc_pmu_device_probe() using
> human-readable events names we got from HW instead of
> using pre-defined events list.
>
> -->8--
> $ perf list
>   [snip]
>   arc_pmu/bdata64/  [Kernel PMU event]
>   arc_pmu/bdcstall/ [Kernel PMU event]
>   arc_pmu/bdslot/   [Kernel PMU event]
>   arc_pmu/bfbmp/[Kernel PMU event]
>   arc_pmu/bfirqex/  [Kernel PMU event]
>   arc_pmu/bflgstal/ [Kernel PMU event]
>   arc_pmu/bflush/   [Kernel PMU event]

Lets call this pct iso pmu since pmu has more of power mgmt connotation. I know
the code has pmu littered all over but atleast the user interface could be more
intuitive.

BTW this approach seems more user friendly and is different from Alexey's 
earlier
stab at implementing raw events [1]. He didn't keep around any list (and relied 
on
use rlooking in the PRM to find his interesting event of the day) and pass as
*r*foobar.  PeterZ at the time had some reservations with that which I never 
fully
understood.

[1] https://lore.kernel.org/patchwork/patch/568769/

> -->8--
>
> Signed-off-by: Eugeniy Paltsev 
> ---
>  arch/arc/kernel/perf_event.c | 107 
> ++-
>  1 file changed, 106 insertions(+), 1 deletion(-)
>
> diff --git a/arch/arc/kernel/perf_event.c b/arch/arc/kernel/perf_event.c
> index 811a07a2ca21..97b88b00c418 100644
> --- a/arch/arc/kernel/perf_event.c
> +++ b/arch/arc/kernel/perf_event.c
> @@ -22,12 +22,28 @@
>  /* HW holds 8 symbols + one for null terminator */
>  #define ARCPMU_EVENT_NAME_LEN9
>  
> +enum arc_pmu_attr_groups {
> + ARCPMU_ATTR_GR_EVENTS,
> + ARCPMU_ATTR_GR_FORMATS,
> + ARCPMU_NR_ATTR_GR
> +};
> +
> +struct arc_pmu_raw_event_entry {
> + char name[ARCPMU_EVENT_NAME_LEN];
> +};
> +
>  struct arc_pmu {
>   struct pmu  pmu;
>   unsigned intirq;
>   int n_counters;
> + int n_events;
>   u64 max_period;
>   int ev_hw_idx[PERF_COUNT_ARC_HW_MAX];
> +
> + struct arc_pmu_raw_event_entry  *raw_entry;
> + struct attribute**attrs;
> + struct perf_pmu_events_attr *attr;
> + const struct attribute_group*attr_groups[ARCPMU_NR_ATTR_GR + 1];
>  };
>  
>  struct arc_pmu_cpu {
> @@ -196,6 +212,17 @@ static int arc_pmu_event_init(struct perf_event *event)
>(int)hwc->config, arc_pmu_ev_hw_map[ret]);
>   return 0;
>  
> + case PERF_TYPE_RAW:
> + if (event->attr.config >= arc_pmu->n_events)
> + return -ENOENT;
> +
> + hwc->config |= event->attr.config;
> + pr_debug("init raw event with idx %lld \'%s\'\n",
> +  event->attr.config,
> +  arc_pmu->raw_entry[event->attr.config].name);
> +
> + return 0;
> +
>   default:
>   return -ENOENT;
>   }
> @@ -446,6 +473,68 @@ static void arc_cpu_pmu_irq_init(void *data)
>   write_aux_reg(ARC_REG_PCT_INT_ACT, 0x);
>  }
>  
> +/* Event field occupies the bottom 15 bits of our config field */
> +PMU_FORMAT_ATTR(event, "config:0-14");
> +static struct attribute *arc_pmu_format_attrs[] = {
> + _attr_event.attr,
> + NULL,
> +};
> +
> +static struct attribute_group arc_pmu_format_attr_gr = {
> + .name = "format",
> + .attrs = arc_pmu_format_attrs,
> +};
> +
> +static ssize_t
> +arc_pmu_events_sysfs_show(struct device *dev,
> +   struct device_attribute *attr, char *page)
> +{
> + struct perf_pmu_events_attr *pmu_attr;
> +
> + pmu_attr = container_of(attr, struct perf_pmu_events_attr, attr);
> + return sprintf(page, "event=0x%04llx\n", pmu_attr->id);
> +}
> +
> +/*
> + * We don't add attrs here as we don't have pre-defined list of perf events.
> + * We will generate and add attrs dynamically in probe() after we read HW
> + * configuration.
> + */
> +static struct attribute_group arc_pmu_events_attr_gr = {
> + .name = "events",
> +};
> +
> +static void arc_pmu_add_raw_event_attr(int j, char *str)
> +{
> + memmove(arc_pmu->raw_entry[j].name, str, ARCPMU_EVENT_NAME_LEN - 1);
> + arc_pmu->attr[j].attr.attr.name = arc_pmu->raw_entry[j].name;
> + arc_pmu->attr[j].attr.attr.mode = VERIFY_OCTAL_PERMISSIONS(0444);
> + arc_pmu->attr[j].attr.show = arc_pmu_events_sysfs_show;
> + arc_pmu->attr[j].id = j;
> + arc_pmu->attrs[j] = &(arc_pmu->attr[j].attr.attr);
> +}
> +
> +static int arc_pmu_raw_alloc(struct device *dev)
> +{
> + 

Re: [PATCH v2 00/22] mfd: demodularization of non-modular drivers

2018-12-05 Thread Paul Gortmaker
[RE: [PATCH v2 00/22] mfd: demodularization of non-modular drivers] On 
05/12/2018 (Wed 12:01) Steve Twiss wrote:

> Hi Paul,
> 
> On 03 December 2018 04:23, Paul Gortmaker wrote:
> 
> > Subject: [PATCH v2 00/22] mfd: demodularization of non-modular drivers
> > 
> > [v1 --> v2: add some more commits as requested by Lee (MFD maintainer),
> >  update the 00/NN text; re-do build and link testing on new linux-next. ]
> > 
> > This group of MFD drivers are all controlled by "bool" Kconfig settings,
> > but contain various amounts of largely pointless uses of infrastructure
> > related to modular operations, even though they can't be built modular.
> > 
> > We can easily remove/replace all of it.  We are trying to make driver
> > code consistent with the Makefiles/Kconfigs that control them. 
> 
> For the DA9052 and DA9055, changes:
> 
> -  drivers/mfd/da9052-core.c | 11 ---
> -  drivers/mfd/da9052-i2c.c  | 22 ++---
> -  drivers/mfd/da9052-irq.c  |  1 -
> -  drivers/mfd/da9052-spi.c  | 22 ++---
> -  drivers/mfd/da9055-core.c | 13 ++---
> -  drivers/mfd/da9055-i2c.c  | 24 ++-
> 
> The responsibility can fall back to Dialog for these changes. We will
> submit Kconfig patches for these and make them explicitly non-modular.
> This will remove the ambiguity caused by the Kconfig bool options.

[typo noted: non-modular ---> modular]

Great, I'll look forward to seeing those patches to convert  these
drivers to modular in the near future, and I'll not re-send mine.

Thanks,
P.
--


> 
> Regards,
> Steve
> 
> > This
> > means not using modular functions/macros for drivers that can never be
> > built as a module.  Some of the downfalls this oversight leads to are:
> > 
> >  (1) it is easy to accidentally write unused module_exit and remove code
> >  (2) it can be misleading when reading the source, thinking it can be
> >  modular when the Makefile and/or Kconfig prohibit it
> >  (3) it requires the include of the module.h header file which in turn
> >  includes nearly everything else, thus adding a lot of CPP overhead.
> >  (4) it gets copied/replicated into other drivers and spreads quickly.
> > 
> > What we see in current drivers falls into one or more of the following
> > categories:
> > 
> > 1) include of  when it simply isn't needed
> > 
> > 2) Use of MODULE_LICENSE, MODULE_DEVICE_TABLE,  MODULE_AUTHOR etc.
> >macros that are no-ops for non-modular drivers.
> > 
> > 3) Creation of a module_exit() function that will be compiled and
> >linked in but never actually called for non-modular drivers.
> > 
> > 4) Addition of a platform_driver ".remove" function that is bound
> >into the struct but will never be called for non-module use cases.
> > 
> > Solution to #1 --> #3 is simple ; we just delete the related code.
> > 
> > The solution to #4 is similar - we delete the ".remove" function and
> > the binding into the platform_driver struct.  However, since the same
> > ".remove" function could also be triggered by an "unbind" (such as for
> > pass-through of a device to a guest instance)  - so we also explicitly
> > disable any unbind for the driver.
> > 
> > The unbind mask allows us to ensure we will see if there was some odd
> > corner case out there that was relying on it.  Typically it would be a
> > multi-port ethernet card passing a port through to a guest, so a
> > sensible use case in MFD drivers seems highly unlikely.  This same
> > solution has already been used in multiple other mainline subsystems.
> > 
> > Build testing was done on drivers/mfd for allyesconfig on x86_64, ARM
> > and ARM-64 on a recent linux-next checkout.
> > 
> > Paul.
> > 
> > ---
> > 
> > Cc: Arnd Bergmann 
> > Cc: Cory Maccarrone 
> > Cc: David Dajun Chen 
> > Cc: Dong Aisheng 
> > Cc: Eric Miao 
> > Cc: Graeme Gregory 
> > Cc: Guennadi Liakhovetski 
> > Cc: Haojian Zhuang 
> > Cc: Jin Park 
> > Cc: Jorge Eduardo Candelaria 
> > Cc: Laxman Dewangan 
> > Cc: Lee Jones 
> > Cc: Linus Walleij 
> > Cc: Mark Brown 
> > Cc: Mattias Nilsson 
> > Cc: Michael Hennerich 
> > Cc: Mike Rapoport 
> > Cc: Tony Lindgren 
> > Cc: Venu Byravarasu 
> > Cc: linux-o...@vger.kernel.org
> > Cc: patc...@opensource.cirrus.com
> > Cc: Support Opensource 
> > 
> > 
> > Paul Gortmaker (22):
> >   mfd: aat2870-core: Make it explicitly non-modular
> >   mfd: adp5520: Make it explicitly non-modular
> >   mfd: as3711: Make it explicitly non-modular
> >   mfd: da903x: Make it explicitly non-modular
> >   mfd: da9052-*: Make it explicitly non-modular
> >   mfd: da9055-i2c: Make it explicitly non-modular
> >   mfd: da9055-core: make it explicitly non-modular
> >   mfd: db8500-prcmu: drop unused MODULE_ tags from non-modular code
> >   mfd: htc-i2cpld: Make it explicitly non-modular
> >   mfd: max8925-core: drop unused MODULE_ tags from non-modular code
> >   mfd: rc5t583: Make it explicitly non-modular
> >   mfd: sta2x11: drop unused 

Re: [PATCH v2 00/22] mfd: demodularization of non-modular drivers

2018-12-05 Thread Paul Gortmaker
[RE: [PATCH v2 00/22] mfd: demodularization of non-modular drivers] On 
05/12/2018 (Wed 12:01) Steve Twiss wrote:

> Hi Paul,
> 
> On 03 December 2018 04:23, Paul Gortmaker wrote:
> 
> > Subject: [PATCH v2 00/22] mfd: demodularization of non-modular drivers
> > 
> > [v1 --> v2: add some more commits as requested by Lee (MFD maintainer),
> >  update the 00/NN text; re-do build and link testing on new linux-next. ]
> > 
> > This group of MFD drivers are all controlled by "bool" Kconfig settings,
> > but contain various amounts of largely pointless uses of infrastructure
> > related to modular operations, even though they can't be built modular.
> > 
> > We can easily remove/replace all of it.  We are trying to make driver
> > code consistent with the Makefiles/Kconfigs that control them. 
> 
> For the DA9052 and DA9055, changes:
> 
> -  drivers/mfd/da9052-core.c | 11 ---
> -  drivers/mfd/da9052-i2c.c  | 22 ++---
> -  drivers/mfd/da9052-irq.c  |  1 -
> -  drivers/mfd/da9052-spi.c  | 22 ++---
> -  drivers/mfd/da9055-core.c | 13 ++---
> -  drivers/mfd/da9055-i2c.c  | 24 ++-
> 
> The responsibility can fall back to Dialog for these changes. We will
> submit Kconfig patches for these and make them explicitly non-modular.
> This will remove the ambiguity caused by the Kconfig bool options.

[typo noted: non-modular ---> modular]

Great, I'll look forward to seeing those patches to convert  these
drivers to modular in the near future, and I'll not re-send mine.

Thanks,
P.
--


> 
> Regards,
> Steve
> 
> > This
> > means not using modular functions/macros for drivers that can never be
> > built as a module.  Some of the downfalls this oversight leads to are:
> > 
> >  (1) it is easy to accidentally write unused module_exit and remove code
> >  (2) it can be misleading when reading the source, thinking it can be
> >  modular when the Makefile and/or Kconfig prohibit it
> >  (3) it requires the include of the module.h header file which in turn
> >  includes nearly everything else, thus adding a lot of CPP overhead.
> >  (4) it gets copied/replicated into other drivers and spreads quickly.
> > 
> > What we see in current drivers falls into one or more of the following
> > categories:
> > 
> > 1) include of  when it simply isn't needed
> > 
> > 2) Use of MODULE_LICENSE, MODULE_DEVICE_TABLE,  MODULE_AUTHOR etc.
> >macros that are no-ops for non-modular drivers.
> > 
> > 3) Creation of a module_exit() function that will be compiled and
> >linked in but never actually called for non-modular drivers.
> > 
> > 4) Addition of a platform_driver ".remove" function that is bound
> >into the struct but will never be called for non-module use cases.
> > 
> > Solution to #1 --> #3 is simple ; we just delete the related code.
> > 
> > The solution to #4 is similar - we delete the ".remove" function and
> > the binding into the platform_driver struct.  However, since the same
> > ".remove" function could also be triggered by an "unbind" (such as for
> > pass-through of a device to a guest instance)  - so we also explicitly
> > disable any unbind for the driver.
> > 
> > The unbind mask allows us to ensure we will see if there was some odd
> > corner case out there that was relying on it.  Typically it would be a
> > multi-port ethernet card passing a port through to a guest, so a
> > sensible use case in MFD drivers seems highly unlikely.  This same
> > solution has already been used in multiple other mainline subsystems.
> > 
> > Build testing was done on drivers/mfd for allyesconfig on x86_64, ARM
> > and ARM-64 on a recent linux-next checkout.
> > 
> > Paul.
> > 
> > ---
> > 
> > Cc: Arnd Bergmann 
> > Cc: Cory Maccarrone 
> > Cc: David Dajun Chen 
> > Cc: Dong Aisheng 
> > Cc: Eric Miao 
> > Cc: Graeme Gregory 
> > Cc: Guennadi Liakhovetski 
> > Cc: Haojian Zhuang 
> > Cc: Jin Park 
> > Cc: Jorge Eduardo Candelaria 
> > Cc: Laxman Dewangan 
> > Cc: Lee Jones 
> > Cc: Linus Walleij 
> > Cc: Mark Brown 
> > Cc: Mattias Nilsson 
> > Cc: Michael Hennerich 
> > Cc: Mike Rapoport 
> > Cc: Tony Lindgren 
> > Cc: Venu Byravarasu 
> > Cc: linux-o...@vger.kernel.org
> > Cc: patc...@opensource.cirrus.com
> > Cc: Support Opensource 
> > 
> > 
> > Paul Gortmaker (22):
> >   mfd: aat2870-core: Make it explicitly non-modular
> >   mfd: adp5520: Make it explicitly non-modular
> >   mfd: as3711: Make it explicitly non-modular
> >   mfd: da903x: Make it explicitly non-modular
> >   mfd: da9052-*: Make it explicitly non-modular
> >   mfd: da9055-i2c: Make it explicitly non-modular
> >   mfd: da9055-core: make it explicitly non-modular
> >   mfd: db8500-prcmu: drop unused MODULE_ tags from non-modular code
> >   mfd: htc-i2cpld: Make it explicitly non-modular
> >   mfd: max8925-core: drop unused MODULE_ tags from non-modular code
> >   mfd: rc5t583: Make it explicitly non-modular
> >   mfd: sta2x11: drop unused 

Re: Re: Re: [PATCH] proc/sysctl: fix return error forproc_doulongvec_minmax

2018-12-05 Thread Luis Chamberlain
On Wed, Dec 05, 2018 at 03:10:07PM +0800, cheng.lin...@zte.com.cn wrote:
> > On Mon, Dec 03, 2018 at 01:12:39PM +0800, cheng.lin...@zte.com.cn wrote:
> > > >Cheng, thanks for the patch!
> > > >
> > > >On Fri, Nov 30, 2018 at 02:35:17PM +0800, Cheng Lin wrote:
> > > >> If the number of input parameters is less than the total
> > > >> parameters, an INVAL error will be returned.
> > > >
> > > >Do you mean EINVAL?
> > > >
> > > Yes, it's EINVAL.
> > 
> > Please adjust the commit log.
> > 
> > > >> This patch ensure no error returned in this condition, just
> > > >> like other interfaces do.
> > > >
> > > >Have an actual example to reproduce?
> > > >
> > > >Luis
> > > >
> > > We use proc_doulongvec_minmax to pass up to two parameters with 
> > > kern_table.
> > > e.g.
> > {
> > > .procname   = "monitor_signals",
> > > .data   = _sigs,
> > > .maxlen = 2*sizeof(unsigned long),
> > > .mode   = 0644,
> > > .proc_handler   = proc_doulongvec_minmax,
> > > },
> > >
> > > Reproduce:
> > > When passing two parameters, it's work normal. But passing only one 
> > > parameter, an error "Invalid argument"(EINVAL) is returned.
> > > [root@cl150 ~]# echo 1 2 > /proc/sys/kernel/monitor_signals
> > > [root@cl150 ~]# cat /proc/sys/kernel/monitor_signals
> > > 1   2
> > > [root@cl150 ~]# echo 3 > /proc/sys/kernel/monitor_signals
> > > -bash: echo: write error: Invalid argument
> > > [root@cl150 ~]# echo $?
> > > 1
> > > [root@cl150 ~]# cat /proc/sys/kernel/monitor_signals
> > > 3   2
> > > [root@cl150 ~]#
> > >
> > > The following is the result after apply this patch. No error is returned 
> > > when the number of input parameters is less than the total parameters.
> > > [root@cl150 ~]# echo 1 2 > /proc/sys/kernel/monitor_signals
> > > [root@cl150 ~]# cat /proc/sys/kernel/monitor_signals
> > > 1   2
> > > [root@cl150 ~]# echo 3 > /proc/sys/kernel/monitor_signals
> > > [root@cl150 ~]# echo $?
> > > 0
> > > [root@cl150 ~]# cat /proc/sys/kernel/monitor_signals
> > > 3   2
> > > [root@cl150 ~]#
> > 
> > This would be good to have in the commit log as well. But your patch
> > only addresses one of the proc users, there are a few other checks like
> > this that would also need to be expanded for this. So please expand
> > your patch to cover the other cases as well.
> 
> I have done the check for the interfaces exported in kernel/sysctl.c.
> EXPORT_SYMBOL(proc_dointvec);
> EXPORT_SYMBOL(proc_douintvec);
> EXPORT_SYMBOL(proc_dointvec_jiffies);
> EXPORT_SYMBOL(proc_dointvec_minmax);
> EXPORT_SYMBOL_GPL(proc_douintvec_minmax);
> EXPORT_SYMBOL(proc_dointvec_userhz_jiffies);
> EXPORT_SYMBOL(proc_dointvec_ms_jiffies);
> EXPORT_SYMBOL(proc_dostring);
> EXPORT_SYMBOL(proc_doulongvec_minmax);
> EXPORT_SYMBOL(proc_doulongvec_ms_jiffies_minmax);
> 
> The function call relationship is as follows. There are three processing 
> functions dealing with digital parameters, 
> __do_proc_dointvec/__do_proc_douintvec/__do_proc_doulongvec_minmax.
> 
> proc_dointvec--|
> proc_dointvec_jiffies--|
> proc_dointvec_minmax--|
> proc_dointvec_userhz_jiffies|
> proc_dointvec_ms_jiffies-|-> do_proc_dointvec|-> 
> __do_proc_dointvec
> 
> proc_douintvec-|
> proc_douintvec_minmax-|-> do_proc_douintvec---|-> 
> __do_proc_douintvec
> 
> proc_doulongvec_minmax---|
> proc_doulongvec_ms_jiffies_minmax--|-> do_proc_doulongvec_minmax|-> 
> __do_proc_doulongvec_minmax

OK

> This patch deals with  __do_proc_doulongvec_minmax, just as
> __do_proc_dointvec does, adding a check for parameters 'left'. In
> __do_proc_douintvec, its code implementation explicitly does not
> support multiple inputs.  static int __do_proc_douintvec(...){
>   ...
> /*
>  * Arrays are not supported, keep this simple. *Do not* add
>  * support for them.
>  */
> if (vleft != 1) {
> *lenp = 0;
> return -EINVAL;
>   ...
>   }
> 
>   
> So, just __do_proc_doulongvec_minmax has the problem.  And most use of
> proc_doulongvec_minmax/proc_doulongvec_ms_jiffies_minmax just have one
> parameter.

The above text, up to my OK, is useful information for the commit log,
please add that.

> It's well hidden.

You mean that the issue is not widely spread? If so please add that
comment to the commit log, and resubmit a v2.

  Luis

> Typical multi-parameter applications for
> proc_dointvec, such as /proc/sys/kernel/printk.


Re: Re: Re: [PATCH] proc/sysctl: fix return error forproc_doulongvec_minmax

2018-12-05 Thread Luis Chamberlain
On Wed, Dec 05, 2018 at 03:10:07PM +0800, cheng.lin...@zte.com.cn wrote:
> > On Mon, Dec 03, 2018 at 01:12:39PM +0800, cheng.lin...@zte.com.cn wrote:
> > > >Cheng, thanks for the patch!
> > > >
> > > >On Fri, Nov 30, 2018 at 02:35:17PM +0800, Cheng Lin wrote:
> > > >> If the number of input parameters is less than the total
> > > >> parameters, an INVAL error will be returned.
> > > >
> > > >Do you mean EINVAL?
> > > >
> > > Yes, it's EINVAL.
> > 
> > Please adjust the commit log.
> > 
> > > >> This patch ensure no error returned in this condition, just
> > > >> like other interfaces do.
> > > >
> > > >Have an actual example to reproduce?
> > > >
> > > >Luis
> > > >
> > > We use proc_doulongvec_minmax to pass up to two parameters with 
> > > kern_table.
> > > e.g.
> > {
> > > .procname   = "monitor_signals",
> > > .data   = _sigs,
> > > .maxlen = 2*sizeof(unsigned long),
> > > .mode   = 0644,
> > > .proc_handler   = proc_doulongvec_minmax,
> > > },
> > >
> > > Reproduce:
> > > When passing two parameters, it's work normal. But passing only one 
> > > parameter, an error "Invalid argument"(EINVAL) is returned.
> > > [root@cl150 ~]# echo 1 2 > /proc/sys/kernel/monitor_signals
> > > [root@cl150 ~]# cat /proc/sys/kernel/monitor_signals
> > > 1   2
> > > [root@cl150 ~]# echo 3 > /proc/sys/kernel/monitor_signals
> > > -bash: echo: write error: Invalid argument
> > > [root@cl150 ~]# echo $?
> > > 1
> > > [root@cl150 ~]# cat /proc/sys/kernel/monitor_signals
> > > 3   2
> > > [root@cl150 ~]#
> > >
> > > The following is the result after apply this patch. No error is returned 
> > > when the number of input parameters is less than the total parameters.
> > > [root@cl150 ~]# echo 1 2 > /proc/sys/kernel/monitor_signals
> > > [root@cl150 ~]# cat /proc/sys/kernel/monitor_signals
> > > 1   2
> > > [root@cl150 ~]# echo 3 > /proc/sys/kernel/monitor_signals
> > > [root@cl150 ~]# echo $?
> > > 0
> > > [root@cl150 ~]# cat /proc/sys/kernel/monitor_signals
> > > 3   2
> > > [root@cl150 ~]#
> > 
> > This would be good to have in the commit log as well. But your patch
> > only addresses one of the proc users, there are a few other checks like
> > this that would also need to be expanded for this. So please expand
> > your patch to cover the other cases as well.
> 
> I have done the check for the interfaces exported in kernel/sysctl.c.
> EXPORT_SYMBOL(proc_dointvec);
> EXPORT_SYMBOL(proc_douintvec);
> EXPORT_SYMBOL(proc_dointvec_jiffies);
> EXPORT_SYMBOL(proc_dointvec_minmax);
> EXPORT_SYMBOL_GPL(proc_douintvec_minmax);
> EXPORT_SYMBOL(proc_dointvec_userhz_jiffies);
> EXPORT_SYMBOL(proc_dointvec_ms_jiffies);
> EXPORT_SYMBOL(proc_dostring);
> EXPORT_SYMBOL(proc_doulongvec_minmax);
> EXPORT_SYMBOL(proc_doulongvec_ms_jiffies_minmax);
> 
> The function call relationship is as follows. There are three processing 
> functions dealing with digital parameters, 
> __do_proc_dointvec/__do_proc_douintvec/__do_proc_doulongvec_minmax.
> 
> proc_dointvec--|
> proc_dointvec_jiffies--|
> proc_dointvec_minmax--|
> proc_dointvec_userhz_jiffies|
> proc_dointvec_ms_jiffies-|-> do_proc_dointvec|-> 
> __do_proc_dointvec
> 
> proc_douintvec-|
> proc_douintvec_minmax-|-> do_proc_douintvec---|-> 
> __do_proc_douintvec
> 
> proc_doulongvec_minmax---|
> proc_doulongvec_ms_jiffies_minmax--|-> do_proc_doulongvec_minmax|-> 
> __do_proc_doulongvec_minmax

OK

> This patch deals with  __do_proc_doulongvec_minmax, just as
> __do_proc_dointvec does, adding a check for parameters 'left'. In
> __do_proc_douintvec, its code implementation explicitly does not
> support multiple inputs.  static int __do_proc_douintvec(...){
>   ...
> /*
>  * Arrays are not supported, keep this simple. *Do not* add
>  * support for them.
>  */
> if (vleft != 1) {
> *lenp = 0;
> return -EINVAL;
>   ...
>   }
> 
>   
> So, just __do_proc_doulongvec_minmax has the problem.  And most use of
> proc_doulongvec_minmax/proc_doulongvec_ms_jiffies_minmax just have one
> parameter.

The above text, up to my OK, is useful information for the commit log,
please add that.

> It's well hidden.

You mean that the issue is not widely spread? If so please add that
comment to the commit log, and resubmit a v2.

  Luis

> Typical multi-parameter applications for
> proc_dointvec, such as /proc/sys/kernel/printk.


Re: [RFC PATCH 02/14] mm/hms: heterogenenous memory system (HMS) documentation

2018-12-05 Thread Jerome Glisse
On Wed, Dec 05, 2018 at 10:41:56AM -0700, Logan Gunthorpe wrote:
> 
> 
> On 2018-12-04 7:31 p.m., Jerome Glisse wrote:
> > How can i express multiple link, or memory that is only accessible
> > by a subset of the devices/CPUs. In today model they are back in
> > assumption like everyone can access all the node which do not hold
> > in what i am trying to do.
> 
> Well multiple links are easy when you have a 'link' bus. Just add
> another link device under the bus.

So you are telling do what i am doing in this patchset but not under
HMS directory ?

> 
> Technically, the accessibility issue is already encoded in sysfs. For
> example, through the PCI tree you can determine which ACS bits are set
> and determine which devices are behind the same root bridge the same way
> we do in the kernel p2pdma subsystem. This is all bus specific which is
> fine, but if we want to change that, we should have a common way for
> existing buses to describe these attributes in the existing tree. The
> new 'link' bus devices would have to have some way to describe cases if
> memory isn't accessible in some way across it.

What i am looking at is much more complex than just access bit. It
is a whole set of properties attach to each path (can it be cache
coherent ? can it do atomic ? what is the access granularity ? what
is the bandwidth ? is it dedicated link ? ...)

> 
> But really, I would say the kernel is responsible for telling you when
> memory is accessible to a list of initiators, so it should be part of
> the checks in a theoretical hbind api. This is already the approach
> p2pdma takes in-kernel: we have functions that tell you if two PCI
> devices can talk to each other and we have functions to give you memory
> accessible by a set of devices. What we don't have is a special tree
> that p2pdma users have to walk through to determine accessibility.

You do not need it, but i do need it they are user out there that are
already depending on the information by getting it through non standard
way. I do want to provide a standard way for userspace to get this.
They are real user out there and i believe their would be more user
if we had a standard way to provide it. You do not believe in it fine.
I will do more work in userspace and more example and i will come back
with more hard evidence until i convince enough people.

> 
> In my eye's, you are just conflating a bunch of different issues that
> are better solved independently in the existing frameworks we have. And
> if they were tackled individually, you'd have a much easier time getting
> them merged one by one.

I don't think i can convince you otherwise. They are user that use topology
please looks at the links i provided, those folks have running program
_today_ they rely on non standard API and would like to move toward standard
API it would improve their life.

On top of that i argue that more people would use that information if it
were available to them. I agree that i have no hard evidence to back that
up and that it is just a feeling but you can not disprove me either as
this is a chicken and egg problem, you can not prove people will not use
an API if the API is not there to be use.

Cheers,
Jérôme


Re: [RFC PATCH 02/14] mm/hms: heterogenenous memory system (HMS) documentation

2018-12-05 Thread Jerome Glisse
On Wed, Dec 05, 2018 at 10:41:56AM -0700, Logan Gunthorpe wrote:
> 
> 
> On 2018-12-04 7:31 p.m., Jerome Glisse wrote:
> > How can i express multiple link, or memory that is only accessible
> > by a subset of the devices/CPUs. In today model they are back in
> > assumption like everyone can access all the node which do not hold
> > in what i am trying to do.
> 
> Well multiple links are easy when you have a 'link' bus. Just add
> another link device under the bus.

So you are telling do what i am doing in this patchset but not under
HMS directory ?

> 
> Technically, the accessibility issue is already encoded in sysfs. For
> example, through the PCI tree you can determine which ACS bits are set
> and determine which devices are behind the same root bridge the same way
> we do in the kernel p2pdma subsystem. This is all bus specific which is
> fine, but if we want to change that, we should have a common way for
> existing buses to describe these attributes in the existing tree. The
> new 'link' bus devices would have to have some way to describe cases if
> memory isn't accessible in some way across it.

What i am looking at is much more complex than just access bit. It
is a whole set of properties attach to each path (can it be cache
coherent ? can it do atomic ? what is the access granularity ? what
is the bandwidth ? is it dedicated link ? ...)

> 
> But really, I would say the kernel is responsible for telling you when
> memory is accessible to a list of initiators, so it should be part of
> the checks in a theoretical hbind api. This is already the approach
> p2pdma takes in-kernel: we have functions that tell you if two PCI
> devices can talk to each other and we have functions to give you memory
> accessible by a set of devices. What we don't have is a special tree
> that p2pdma users have to walk through to determine accessibility.

You do not need it, but i do need it they are user out there that are
already depending on the information by getting it through non standard
way. I do want to provide a standard way for userspace to get this.
They are real user out there and i believe their would be more user
if we had a standard way to provide it. You do not believe in it fine.
I will do more work in userspace and more example and i will come back
with more hard evidence until i convince enough people.

> 
> In my eye's, you are just conflating a bunch of different issues that
> are better solved independently in the existing frameworks we have. And
> if they were tackled individually, you'd have a much easier time getting
> them merged one by one.

I don't think i can convince you otherwise. They are user that use topology
please looks at the links i provided, those folks have running program
_today_ they rely on non standard API and would like to move toward standard
API it would improve their life.

On top of that i argue that more people would use that information if it
were available to them. I agree that i have no hard evidence to back that
up and that it is just a feeling but you can not disprove me either as
this is a chicken and egg problem, you can not prove people will not use
an API if the API is not there to be use.

Cheers,
Jérôme


[tip:x86/mm] x86/mm: Drop usage of __flush_tlb_all() in kernel_physical_mapping_init()

2018-12-05 Thread tip-bot for Dan Williams
Commit-ID:  ba6f508d0ec4adb09f0a939af6d5e19cdfa8667d
Gitweb: https://git.kernel.org/tip/ba6f508d0ec4adb09f0a939af6d5e19cdfa8667d
Author: Dan Williams 
AuthorDate: Tue, 4 Dec 2018 13:37:27 -0800
Committer:  Ingo Molnar 
CommitDate: Wed, 5 Dec 2018 09:03:07 +0100

x86/mm: Drop usage of __flush_tlb_all() in kernel_physical_mapping_init()

Commit:

  f77084d96355 "x86/mm/pat: Disable preemption around __flush_tlb_all()"

addressed a case where __flush_tlb_all() is called without preemption
being disabled. It also left a warning to catch other cases where
preemption is not disabled.

That warning triggers for the memory hotplug path which is also used for
persistent memory enabling:

 WARNING: CPU: 35 PID: 911 at ./arch/x86/include/asm/tlbflush.h:460
 RIP: 0010:__flush_tlb_all+0x1b/0x3a
 [..]
 Call Trace:
  phys_pud_init+0x29c/0x2bb
  kernel_physical_mapping_init+0xfc/0x219
  init_memory_mapping+0x1a5/0x3b0
  arch_add_memory+0x2c/0x50
  devm_memremap_pages+0x3aa/0x610
  pmem_attach_disk+0x585/0x700 [nd_pmem]

Andy wondered why a path that can sleep was using __flush_tlb_all() [1]
and Dave confirmed the expectation for TLB flush is for modifying /
invalidating existing PTE entries, but not initial population [2]. Drop
the usage of __flush_tlb_all() in phys_{p4d,pud,pmd}_init() on the
expectation that this path is only ever populating empty entries for the
linear map. Note, at linear map teardown time there is a call to the
all-cpu flush_tlb_all() to invalidate the removed mappings.

[1]: 
https://lkml.kernel.org/r/9dfd717d-857d-493d-a606-b635d72ba...@amacapital.net
[2]: https://lkml.kernel.org/r/749919a4-cdb1-48a3-adb4-adb81a5fa...@intel.com

[ mingo: Minor readability edits. ]

Suggested-by: Dave Hansen 
Reported-by: Andy Lutomirski 
Signed-off-by: Dan Williams 
Acked-by: Peter Zijlstra (Intel) 
Acked-by: Kirill A. Shutemov 
Cc: 
Cc: Borislav Petkov 
Cc: H. Peter Anvin 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Rik van Riel 
Cc: Sebastian Andrzej Siewior 
Cc: Thomas Gleixner 
Cc: dave.han...@intel.com
Fixes: f77084d96355 ("x86/mm/pat: Disable preemption around __flush_tlb_all()")
Link: 
http://lkml.kernel.org/r/154395944713.32119.15611079023837132638.st...@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Ingo Molnar 
---
 arch/x86/mm/init_64.c | 6 --
 1 file changed, 6 deletions(-)

diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 3e25ac2793ef..484c1b92f078 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -584,7 +584,6 @@ phys_pud_init(pud_t *pud_page, unsigned long paddr, 
unsigned long paddr_end,
   paddr_end,
   page_size_mask,
   prot);
-   __flush_tlb_all();
continue;
}
/*
@@ -627,7 +626,6 @@ phys_pud_init(pud_t *pud_page, unsigned long paddr, 
unsigned long paddr_end,
pud_populate_safe(_mm, pud, pmd);
spin_unlock(_mm.page_table_lock);
}
-   __flush_tlb_all();
 
update_page_count(PG_LEVEL_1G, pages);
 
@@ -668,7 +666,6 @@ phys_p4d_init(p4d_t *p4d_page, unsigned long paddr, 
unsigned long paddr_end,
paddr_last = phys_pud_init(pud, paddr,
paddr_end,
page_size_mask);
-   __flush_tlb_all();
continue;
}
 
@@ -680,7 +677,6 @@ phys_p4d_init(p4d_t *p4d_page, unsigned long paddr, 
unsigned long paddr_end,
p4d_populate_safe(_mm, p4d, pud);
spin_unlock(_mm.page_table_lock);
}
-   __flush_tlb_all();
 
return paddr_last;
 }
@@ -733,8 +729,6 @@ kernel_physical_mapping_init(unsigned long paddr_start,
if (pgd_changed)
sync_global_pgds(vaddr_start, vaddr_end - 1);
 
-   __flush_tlb_all();
-
return paddr_last;
 }
 


[tip:x86/mm] x86/mm: Drop usage of __flush_tlb_all() in kernel_physical_mapping_init()

2018-12-05 Thread tip-bot for Dan Williams
Commit-ID:  ba6f508d0ec4adb09f0a939af6d5e19cdfa8667d
Gitweb: https://git.kernel.org/tip/ba6f508d0ec4adb09f0a939af6d5e19cdfa8667d
Author: Dan Williams 
AuthorDate: Tue, 4 Dec 2018 13:37:27 -0800
Committer:  Ingo Molnar 
CommitDate: Wed, 5 Dec 2018 09:03:07 +0100

x86/mm: Drop usage of __flush_tlb_all() in kernel_physical_mapping_init()

Commit:

  f77084d96355 "x86/mm/pat: Disable preemption around __flush_tlb_all()"

addressed a case where __flush_tlb_all() is called without preemption
being disabled. It also left a warning to catch other cases where
preemption is not disabled.

That warning triggers for the memory hotplug path which is also used for
persistent memory enabling:

 WARNING: CPU: 35 PID: 911 at ./arch/x86/include/asm/tlbflush.h:460
 RIP: 0010:__flush_tlb_all+0x1b/0x3a
 [..]
 Call Trace:
  phys_pud_init+0x29c/0x2bb
  kernel_physical_mapping_init+0xfc/0x219
  init_memory_mapping+0x1a5/0x3b0
  arch_add_memory+0x2c/0x50
  devm_memremap_pages+0x3aa/0x610
  pmem_attach_disk+0x585/0x700 [nd_pmem]

Andy wondered why a path that can sleep was using __flush_tlb_all() [1]
and Dave confirmed the expectation for TLB flush is for modifying /
invalidating existing PTE entries, but not initial population [2]. Drop
the usage of __flush_tlb_all() in phys_{p4d,pud,pmd}_init() on the
expectation that this path is only ever populating empty entries for the
linear map. Note, at linear map teardown time there is a call to the
all-cpu flush_tlb_all() to invalidate the removed mappings.

[1]: 
https://lkml.kernel.org/r/9dfd717d-857d-493d-a606-b635d72ba...@amacapital.net
[2]: https://lkml.kernel.org/r/749919a4-cdb1-48a3-adb4-adb81a5fa...@intel.com

[ mingo: Minor readability edits. ]

Suggested-by: Dave Hansen 
Reported-by: Andy Lutomirski 
Signed-off-by: Dan Williams 
Acked-by: Peter Zijlstra (Intel) 
Acked-by: Kirill A. Shutemov 
Cc: 
Cc: Borislav Petkov 
Cc: H. Peter Anvin 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Rik van Riel 
Cc: Sebastian Andrzej Siewior 
Cc: Thomas Gleixner 
Cc: dave.han...@intel.com
Fixes: f77084d96355 ("x86/mm/pat: Disable preemption around __flush_tlb_all()")
Link: 
http://lkml.kernel.org/r/154395944713.32119.15611079023837132638.st...@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Ingo Molnar 
---
 arch/x86/mm/init_64.c | 6 --
 1 file changed, 6 deletions(-)

diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 3e25ac2793ef..484c1b92f078 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -584,7 +584,6 @@ phys_pud_init(pud_t *pud_page, unsigned long paddr, 
unsigned long paddr_end,
   paddr_end,
   page_size_mask,
   prot);
-   __flush_tlb_all();
continue;
}
/*
@@ -627,7 +626,6 @@ phys_pud_init(pud_t *pud_page, unsigned long paddr, 
unsigned long paddr_end,
pud_populate_safe(_mm, pud, pmd);
spin_unlock(_mm.page_table_lock);
}
-   __flush_tlb_all();
 
update_page_count(PG_LEVEL_1G, pages);
 
@@ -668,7 +666,6 @@ phys_p4d_init(p4d_t *p4d_page, unsigned long paddr, 
unsigned long paddr_end,
paddr_last = phys_pud_init(pud, paddr,
paddr_end,
page_size_mask);
-   __flush_tlb_all();
continue;
}
 
@@ -680,7 +677,6 @@ phys_p4d_init(p4d_t *p4d_page, unsigned long paddr, 
unsigned long paddr_end,
p4d_populate_safe(_mm, p4d, pud);
spin_unlock(_mm.page_table_lock);
}
-   __flush_tlb_all();
 
return paddr_last;
 }
@@ -733,8 +729,6 @@ kernel_physical_mapping_init(unsigned long paddr_start,
if (pgd_changed)
sync_global_pgds(vaddr_start, vaddr_end - 1);
 
-   __flush_tlb_all();
-
return paddr_last;
 }
 


[tip:x86/mm] generic/pgtable: Introduce set_pte_safe()

2018-12-05 Thread tip-bot for Dan Williams
Commit-ID:  4369deaa2f022ef92da45a0e7eec8a4a52e8e8a4
Gitweb: https://git.kernel.org/tip/4369deaa2f022ef92da45a0e7eec8a4a52e8e8a4
Author: Dan Williams 
AuthorDate: Tue, 4 Dec 2018 13:37:16 -0800
Committer:  Ingo Molnar 
CommitDate: Wed, 5 Dec 2018 09:03:06 +0100

generic/pgtable: Introduce set_pte_safe()

Commit:

  f77084d96355 "x86/mm/pat: Disable preemption around __flush_tlb_all()"

introduced a warning to capture cases __flush_tlb_all() is called without
pre-emption disabled. It triggers a false positive warning in the memory
hotplug path.

On investigation it was found that the __flush_tlb_all() calls are not
necessary. However, they are only "not necessary" in practice provided
the ptes are being initially populated from the !present state.

Introduce set_pte_safe() as a sanity check that the pte is being updated
in a way that does not require a TLB flush.

Forgive the macro, the availability of the various of set_pte() levels
is hit and miss across architectures.

[ mingo: Minor readability edits. ]

Suggested-by: Peter Zijlstra 
Suggested-by: Dave Hansen 
Signed-off-by: Dan Williams 
Acked-by: Peter Zijlstra (Intel) 
Acked-by: Kirill A. Shutemov 
Cc: Andy Lutomirski 
Cc: Borislav Petkov 
Cc: Dave Hansen 
Cc: H. Peter Anvin 
Cc: Linus Torvalds 
Cc: Rik van Riel 
Cc: Sebastian Andrzej Siewior 
Cc: Thomas Gleixner 
Link: https://lkml.kernel.org/r/279dadae-9148-465c-7ec6-3f37e026c...@intel.com
Signed-off-by: Ingo Molnar 
---
 include/asm-generic/pgtable.h | 38 ++
 1 file changed, 38 insertions(+)

diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
index dae7f98babed..a9cac82e9a7a 100644
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -400,6 +400,44 @@ static inline int pgd_same(pgd_t pgd_a, pgd_t pgd_b)
 }
 #endif
 
+/*
+ * Use set_p*_safe(), and elide TLB flushing, when confident that *no*
+ * TLB flush will be required as a result of the "set". For example, use
+ * in scenarios where it is known ahead of time that the routine is
+ * setting non-present entries, or re-setting an existing entry to the
+ * same value. Otherwise, use the typical "set" helpers and flush the
+ * TLB.
+ */
+#define set_pte_safe(ptep, pte) \
+({ \
+   WARN_ON_ONCE(pte_present(*ptep) && !pte_same(*ptep, pte)); \
+   set_pte(ptep, pte); \
+})
+
+#define set_pmd_safe(pmdp, pmd) \
+({ \
+   WARN_ON_ONCE(pmd_present(*pmdp) && !pmd_same(*pmdp, pmd)); \
+   set_pmd(pmdp, pmd); \
+})
+
+#define set_pud_safe(pudp, pud) \
+({ \
+   WARN_ON_ONCE(pud_present(*pudp) && !pud_same(*pudp, pud)); \
+   set_pud(pudp, pud); \
+})
+
+#define set_p4d_safe(p4dp, p4d) \
+({ \
+   WARN_ON_ONCE(p4d_present(*p4dp) && !p4d_same(*p4dp, p4d)); \
+   set_p4d(p4dp, p4d); \
+})
+
+#define set_pgd_safe(pgdp, pgd) \
+({ \
+   WARN_ON_ONCE(pgd_present(*pgdp) && !pgd_same(*pgdp, pgd)); \
+   set_pgd(pgdp, pgd); \
+})
+
 #ifndef __HAVE_ARCH_DO_SWAP_PAGE
 /*
  * Some architectures support metadata associated with a page. When a


[tip:x86/mm] generic/pgtable: Introduce set_pte_safe()

2018-12-05 Thread tip-bot for Dan Williams
Commit-ID:  4369deaa2f022ef92da45a0e7eec8a4a52e8e8a4
Gitweb: https://git.kernel.org/tip/4369deaa2f022ef92da45a0e7eec8a4a52e8e8a4
Author: Dan Williams 
AuthorDate: Tue, 4 Dec 2018 13:37:16 -0800
Committer:  Ingo Molnar 
CommitDate: Wed, 5 Dec 2018 09:03:06 +0100

generic/pgtable: Introduce set_pte_safe()

Commit:

  f77084d96355 "x86/mm/pat: Disable preemption around __flush_tlb_all()"

introduced a warning to capture cases __flush_tlb_all() is called without
pre-emption disabled. It triggers a false positive warning in the memory
hotplug path.

On investigation it was found that the __flush_tlb_all() calls are not
necessary. However, they are only "not necessary" in practice provided
the ptes are being initially populated from the !present state.

Introduce set_pte_safe() as a sanity check that the pte is being updated
in a way that does not require a TLB flush.

Forgive the macro, the availability of the various of set_pte() levels
is hit and miss across architectures.

[ mingo: Minor readability edits. ]

Suggested-by: Peter Zijlstra 
Suggested-by: Dave Hansen 
Signed-off-by: Dan Williams 
Acked-by: Peter Zijlstra (Intel) 
Acked-by: Kirill A. Shutemov 
Cc: Andy Lutomirski 
Cc: Borislav Petkov 
Cc: Dave Hansen 
Cc: H. Peter Anvin 
Cc: Linus Torvalds 
Cc: Rik van Riel 
Cc: Sebastian Andrzej Siewior 
Cc: Thomas Gleixner 
Link: https://lkml.kernel.org/r/279dadae-9148-465c-7ec6-3f37e026c...@intel.com
Signed-off-by: Ingo Molnar 
---
 include/asm-generic/pgtable.h | 38 ++
 1 file changed, 38 insertions(+)

diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
index dae7f98babed..a9cac82e9a7a 100644
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -400,6 +400,44 @@ static inline int pgd_same(pgd_t pgd_a, pgd_t pgd_b)
 }
 #endif
 
+/*
+ * Use set_p*_safe(), and elide TLB flushing, when confident that *no*
+ * TLB flush will be required as a result of the "set". For example, use
+ * in scenarios where it is known ahead of time that the routine is
+ * setting non-present entries, or re-setting an existing entry to the
+ * same value. Otherwise, use the typical "set" helpers and flush the
+ * TLB.
+ */
+#define set_pte_safe(ptep, pte) \
+({ \
+   WARN_ON_ONCE(pte_present(*ptep) && !pte_same(*ptep, pte)); \
+   set_pte(ptep, pte); \
+})
+
+#define set_pmd_safe(pmdp, pmd) \
+({ \
+   WARN_ON_ONCE(pmd_present(*pmdp) && !pmd_same(*pmdp, pmd)); \
+   set_pmd(pmdp, pmd); \
+})
+
+#define set_pud_safe(pudp, pud) \
+({ \
+   WARN_ON_ONCE(pud_present(*pudp) && !pud_same(*pudp, pud)); \
+   set_pud(pudp, pud); \
+})
+
+#define set_p4d_safe(p4dp, p4d) \
+({ \
+   WARN_ON_ONCE(p4d_present(*p4dp) && !p4d_same(*p4dp, p4d)); \
+   set_p4d(p4dp, p4d); \
+})
+
+#define set_pgd_safe(pgdp, pgd) \
+({ \
+   WARN_ON_ONCE(pgd_present(*pgdp) && !pgd_same(*pgdp, pgd)); \
+   set_pgd(pgdp, pgd); \
+})
+
 #ifndef __HAVE_ARCH_DO_SWAP_PAGE
 /*
  * Some architectures support metadata associated with a page. When a


[tip:x86/mm] x86/mm: Validate kernel_physical_mapping_init() PTE population

2018-12-05 Thread tip-bot for Dan Williams
Commit-ID:  0a9fe8ca844d43f3f547f0e166122b6048121c8f
Gitweb: https://git.kernel.org/tip/0a9fe8ca844d43f3f547f0e166122b6048121c8f
Author: Dan Williams 
AuthorDate: Tue, 4 Dec 2018 13:37:21 -0800
Committer:  Ingo Molnar 
CommitDate: Wed, 5 Dec 2018 09:03:06 +0100

x86/mm: Validate kernel_physical_mapping_init() PTE population

The usage of __flush_tlb_all() in the kernel_physical_mapping_init()
path is not necessary. In general flushing the TLB is not required when
updating an entry from the !present state. However, to give confidence
in the future removal of TLB flushing in this path, use the new
set_pte_safe() family of helpers to assert that the !present assumption
is true in this path.

[ mingo: Minor readability edits. ]

Suggested-by: Peter Zijlstra 
Suggested-by: Dave Hansen 
Signed-off-by: Dan Williams 
Acked-by: Kirill A. Shutemov 
Acked-by: Peter Zijlstra (Intel) 
Cc: Andy Lutomirski 
Cc: Borislav Petkov 
Cc: Dave Hansen 
Cc: H. Peter Anvin 
Cc: Linus Torvalds 
Cc: Rik van Riel 
Cc: Sebastian Andrzej Siewior 
Cc: Thomas Gleixner 
Link: 
http://lkml.kernel.org/r/154395944177.32119.8524957429632012270.st...@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Ingo Molnar 
---
 arch/x86/include/asm/pgalloc.h   | 27 +++
 arch/x86/mm/init_64.c| 24 
 include/asm-generic/5level-fixup.h   |  1 +
 include/asm-generic/pgtable-nop4d-hack.h |  1 +
 include/asm-generic/pgtable-nop4d.h  |  1 +
 include/asm-generic/pgtable-nopud.h  |  1 +
 6 files changed, 43 insertions(+), 12 deletions(-)

diff --git a/arch/x86/include/asm/pgalloc.h b/arch/x86/include/asm/pgalloc.h
index ec7f43327033..1ea41aaef68b 100644
--- a/arch/x86/include/asm/pgalloc.h
+++ b/arch/x86/include/asm/pgalloc.h
@@ -80,6 +80,13 @@ static inline void pmd_populate_kernel(struct mm_struct *mm,
set_pmd(pmd, __pmd(__pa(pte) | _PAGE_TABLE));
 }
 
+static inline void pmd_populate_kernel_safe(struct mm_struct *mm,
+  pmd_t *pmd, pte_t *pte)
+{
+   paravirt_alloc_pte(mm, __pa(pte) >> PAGE_SHIFT);
+   set_pmd_safe(pmd, __pmd(__pa(pte) | _PAGE_TABLE));
+}
+
 static inline void pmd_populate(struct mm_struct *mm, pmd_t *pmd,
struct page *pte)
 {
@@ -132,6 +139,12 @@ static inline void pud_populate(struct mm_struct *mm, 
pud_t *pud, pmd_t *pmd)
paravirt_alloc_pmd(mm, __pa(pmd) >> PAGE_SHIFT);
set_pud(pud, __pud(_PAGE_TABLE | __pa(pmd)));
 }
+
+static inline void pud_populate_safe(struct mm_struct *mm, pud_t *pud, pmd_t 
*pmd)
+{
+   paravirt_alloc_pmd(mm, __pa(pmd) >> PAGE_SHIFT);
+   set_pud_safe(pud, __pud(_PAGE_TABLE | __pa(pmd)));
+}
 #endif /* CONFIG_X86_PAE */
 
 #if CONFIG_PGTABLE_LEVELS > 3
@@ -141,6 +154,12 @@ static inline void p4d_populate(struct mm_struct *mm, 
p4d_t *p4d, pud_t *pud)
set_p4d(p4d, __p4d(_PAGE_TABLE | __pa(pud)));
 }
 
+static inline void p4d_populate_safe(struct mm_struct *mm, p4d_t *p4d, pud_t 
*pud)
+{
+   paravirt_alloc_pud(mm, __pa(pud) >> PAGE_SHIFT);
+   set_p4d_safe(p4d, __p4d(_PAGE_TABLE | __pa(pud)));
+}
+
 static inline pud_t *pud_alloc_one(struct mm_struct *mm, unsigned long addr)
 {
gfp_t gfp = GFP_KERNEL_ACCOUNT;
@@ -173,6 +192,14 @@ static inline void pgd_populate(struct mm_struct *mm, 
pgd_t *pgd, p4d_t *p4d)
set_pgd(pgd, __pgd(_PAGE_TABLE | __pa(p4d)));
 }
 
+static inline void pgd_populate_safe(struct mm_struct *mm, pgd_t *pgd, p4d_t 
*p4d)
+{
+   if (!pgtable_l5_enabled())
+   return;
+   paravirt_alloc_p4d(mm, __pa(p4d) >> PAGE_SHIFT);
+   set_pgd_safe(pgd, __pgd(_PAGE_TABLE | __pa(p4d)));
+}
+
 static inline p4d_t *p4d_alloc_one(struct mm_struct *mm, unsigned long addr)
 {
gfp_t gfp = GFP_KERNEL_ACCOUNT;
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 5fab264948c2..3e25ac2793ef 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -432,7 +432,7 @@ phys_pte_init(pte_t *pte_page, unsigned long paddr, 
unsigned long paddr_end,
 E820_TYPE_RAM) &&
!e820__mapped_any(paddr & PAGE_MASK, paddr_next,
 E820_TYPE_RESERVED_KERN))
-   set_pte(pte, __pte(0));
+   set_pte_safe(pte, __pte(0));
continue;
}
 
@@ -452,7 +452,7 @@ phys_pte_init(pte_t *pte_page, unsigned long paddr, 
unsigned long paddr_end,
pr_info("   pte=%p addr=%lx pte=%016lx\n", pte, paddr,
pfn_pte(paddr >> PAGE_SHIFT, PAGE_KERNEL).pte);
pages++;
-   set_pte(pte, pfn_pte(paddr >> PAGE_SHIFT, prot));
+   set_pte_safe(pte, pfn_pte(paddr >> PAGE_SHIFT, prot));
paddr_last = (paddr & PAGE_MASK) + PAGE_SIZE;
}
 
@@ -487,7 +487,7 @@ phys_pmd_init(pmd_t 

[tip:x86/mm] x86/mm: Validate kernel_physical_mapping_init() PTE population

2018-12-05 Thread tip-bot for Dan Williams
Commit-ID:  0a9fe8ca844d43f3f547f0e166122b6048121c8f
Gitweb: https://git.kernel.org/tip/0a9fe8ca844d43f3f547f0e166122b6048121c8f
Author: Dan Williams 
AuthorDate: Tue, 4 Dec 2018 13:37:21 -0800
Committer:  Ingo Molnar 
CommitDate: Wed, 5 Dec 2018 09:03:06 +0100

x86/mm: Validate kernel_physical_mapping_init() PTE population

The usage of __flush_tlb_all() in the kernel_physical_mapping_init()
path is not necessary. In general flushing the TLB is not required when
updating an entry from the !present state. However, to give confidence
in the future removal of TLB flushing in this path, use the new
set_pte_safe() family of helpers to assert that the !present assumption
is true in this path.

[ mingo: Minor readability edits. ]

Suggested-by: Peter Zijlstra 
Suggested-by: Dave Hansen 
Signed-off-by: Dan Williams 
Acked-by: Kirill A. Shutemov 
Acked-by: Peter Zijlstra (Intel) 
Cc: Andy Lutomirski 
Cc: Borislav Petkov 
Cc: Dave Hansen 
Cc: H. Peter Anvin 
Cc: Linus Torvalds 
Cc: Rik van Riel 
Cc: Sebastian Andrzej Siewior 
Cc: Thomas Gleixner 
Link: 
http://lkml.kernel.org/r/154395944177.32119.8524957429632012270.st...@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Ingo Molnar 
---
 arch/x86/include/asm/pgalloc.h   | 27 +++
 arch/x86/mm/init_64.c| 24 
 include/asm-generic/5level-fixup.h   |  1 +
 include/asm-generic/pgtable-nop4d-hack.h |  1 +
 include/asm-generic/pgtable-nop4d.h  |  1 +
 include/asm-generic/pgtable-nopud.h  |  1 +
 6 files changed, 43 insertions(+), 12 deletions(-)

diff --git a/arch/x86/include/asm/pgalloc.h b/arch/x86/include/asm/pgalloc.h
index ec7f43327033..1ea41aaef68b 100644
--- a/arch/x86/include/asm/pgalloc.h
+++ b/arch/x86/include/asm/pgalloc.h
@@ -80,6 +80,13 @@ static inline void pmd_populate_kernel(struct mm_struct *mm,
set_pmd(pmd, __pmd(__pa(pte) | _PAGE_TABLE));
 }
 
+static inline void pmd_populate_kernel_safe(struct mm_struct *mm,
+  pmd_t *pmd, pte_t *pte)
+{
+   paravirt_alloc_pte(mm, __pa(pte) >> PAGE_SHIFT);
+   set_pmd_safe(pmd, __pmd(__pa(pte) | _PAGE_TABLE));
+}
+
 static inline void pmd_populate(struct mm_struct *mm, pmd_t *pmd,
struct page *pte)
 {
@@ -132,6 +139,12 @@ static inline void pud_populate(struct mm_struct *mm, 
pud_t *pud, pmd_t *pmd)
paravirt_alloc_pmd(mm, __pa(pmd) >> PAGE_SHIFT);
set_pud(pud, __pud(_PAGE_TABLE | __pa(pmd)));
 }
+
+static inline void pud_populate_safe(struct mm_struct *mm, pud_t *pud, pmd_t 
*pmd)
+{
+   paravirt_alloc_pmd(mm, __pa(pmd) >> PAGE_SHIFT);
+   set_pud_safe(pud, __pud(_PAGE_TABLE | __pa(pmd)));
+}
 #endif /* CONFIG_X86_PAE */
 
 #if CONFIG_PGTABLE_LEVELS > 3
@@ -141,6 +154,12 @@ static inline void p4d_populate(struct mm_struct *mm, 
p4d_t *p4d, pud_t *pud)
set_p4d(p4d, __p4d(_PAGE_TABLE | __pa(pud)));
 }
 
+static inline void p4d_populate_safe(struct mm_struct *mm, p4d_t *p4d, pud_t 
*pud)
+{
+   paravirt_alloc_pud(mm, __pa(pud) >> PAGE_SHIFT);
+   set_p4d_safe(p4d, __p4d(_PAGE_TABLE | __pa(pud)));
+}
+
 static inline pud_t *pud_alloc_one(struct mm_struct *mm, unsigned long addr)
 {
gfp_t gfp = GFP_KERNEL_ACCOUNT;
@@ -173,6 +192,14 @@ static inline void pgd_populate(struct mm_struct *mm, 
pgd_t *pgd, p4d_t *p4d)
set_pgd(pgd, __pgd(_PAGE_TABLE | __pa(p4d)));
 }
 
+static inline void pgd_populate_safe(struct mm_struct *mm, pgd_t *pgd, p4d_t 
*p4d)
+{
+   if (!pgtable_l5_enabled())
+   return;
+   paravirt_alloc_p4d(mm, __pa(p4d) >> PAGE_SHIFT);
+   set_pgd_safe(pgd, __pgd(_PAGE_TABLE | __pa(p4d)));
+}
+
 static inline p4d_t *p4d_alloc_one(struct mm_struct *mm, unsigned long addr)
 {
gfp_t gfp = GFP_KERNEL_ACCOUNT;
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 5fab264948c2..3e25ac2793ef 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -432,7 +432,7 @@ phys_pte_init(pte_t *pte_page, unsigned long paddr, 
unsigned long paddr_end,
 E820_TYPE_RAM) &&
!e820__mapped_any(paddr & PAGE_MASK, paddr_next,
 E820_TYPE_RESERVED_KERN))
-   set_pte(pte, __pte(0));
+   set_pte_safe(pte, __pte(0));
continue;
}
 
@@ -452,7 +452,7 @@ phys_pte_init(pte_t *pte_page, unsigned long paddr, 
unsigned long paddr_end,
pr_info("   pte=%p addr=%lx pte=%016lx\n", pte, paddr,
pfn_pte(paddr >> PAGE_SHIFT, PAGE_KERNEL).pte);
pages++;
-   set_pte(pte, pfn_pte(paddr >> PAGE_SHIFT, prot));
+   set_pte_safe(pte, pfn_pte(paddr >> PAGE_SHIFT, prot));
paddr_last = (paddr & PAGE_MASK) + PAGE_SIZE;
}
 
@@ -487,7 +487,7 @@ phys_pmd_init(pmd_t 

[tip:x86/asm] x86/vdso: Remove a stale/misleading comment from the linker script

2018-12-05 Thread tip-bot for Sean Christopherson
Commit-ID:  29434801e7e9c6d05fbea4533b3c0bd6be612f62
Gitweb: https://git.kernel.org/tip/29434801e7e9c6d05fbea4533b3c0bd6be612f62
Author: Sean Christopherson 
AuthorDate: Tue, 4 Dec 2018 13:25:58 -0800
Committer:  Ingo Molnar 
CommitDate: Wed, 5 Dec 2018 08:58:12 +0100

x86/vdso: Remove a stale/misleading comment from the linker script

Once upon a time, vdso2c aggressively stripped data from the vDSO
image when generating the final userspace image.  This included
stripping the .altinstructions and .altinstr_replacement sections.
Eventually, the stripping process reverted to "objdump -S" and no
longer removed the aforementioned sections, but the comment remained.

Keeping the .alt* sections at the end of the PT_LOAD segment is no
longer necessary, but there's no harm in doing so and it's a helpful
reminder that they don't need to be included in the final vDSO image,
i.e. someone may want to take another stab at zapping/stripping the
unneeded sections.

Signed-off-by: Sean Christopherson 
Acked-by: Andy Lutomirski 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Dave Hansen 
Cc: Denys Vlasenko 
Cc: H. Peter Anvin 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Rik van Riel 
Cc: Thomas Gleixner 
Fixes: da861e18eccc ("x86, vdso: Get rid of the fake section mechanism")
Link: 
http://lkml.kernel.org/r/20181204212600.28090-3-sean.j.christopher...@intel.com
Signed-off-by: Ingo Molnar 
---
 arch/x86/entry/vdso/vdso-layout.lds.S | 5 -
 1 file changed, 5 deletions(-)

diff --git a/arch/x86/entry/vdso/vdso-layout.lds.S 
b/arch/x86/entry/vdso/vdso-layout.lds.S
index 0cedc905c8d6..93c6dc7812d0 100644
--- a/arch/x86/entry/vdso/vdso-layout.lds.S
+++ b/arch/x86/entry/vdso/vdso-layout.lds.S
@@ -65,11 +65,6 @@ SECTIONS
 
.text   : { *(.text*) } :text   =0x90909090,
 
-   /*
-* At the end so that eu-elflint stays happy when vdso2c strips
-* these.  A better implementation would avoid allocating space
-* for these.
-*/
.altinstructions: { *(.altinstructions) }   :text
.altinstr_replacement   : { *(.altinstr_replacement) }  :text
 


[tip:x86/asm] x86/vdso: Remove a stale/misleading comment from the linker script

2018-12-05 Thread tip-bot for Sean Christopherson
Commit-ID:  29434801e7e9c6d05fbea4533b3c0bd6be612f62
Gitweb: https://git.kernel.org/tip/29434801e7e9c6d05fbea4533b3c0bd6be612f62
Author: Sean Christopherson 
AuthorDate: Tue, 4 Dec 2018 13:25:58 -0800
Committer:  Ingo Molnar 
CommitDate: Wed, 5 Dec 2018 08:58:12 +0100

x86/vdso: Remove a stale/misleading comment from the linker script

Once upon a time, vdso2c aggressively stripped data from the vDSO
image when generating the final userspace image.  This included
stripping the .altinstructions and .altinstr_replacement sections.
Eventually, the stripping process reverted to "objdump -S" and no
longer removed the aforementioned sections, but the comment remained.

Keeping the .alt* sections at the end of the PT_LOAD segment is no
longer necessary, but there's no harm in doing so and it's a helpful
reminder that they don't need to be included in the final vDSO image,
i.e. someone may want to take another stab at zapping/stripping the
unneeded sections.

Signed-off-by: Sean Christopherson 
Acked-by: Andy Lutomirski 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Dave Hansen 
Cc: Denys Vlasenko 
Cc: H. Peter Anvin 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Rik van Riel 
Cc: Thomas Gleixner 
Fixes: da861e18eccc ("x86, vdso: Get rid of the fake section mechanism")
Link: 
http://lkml.kernel.org/r/20181204212600.28090-3-sean.j.christopher...@intel.com
Signed-off-by: Ingo Molnar 
---
 arch/x86/entry/vdso/vdso-layout.lds.S | 5 -
 1 file changed, 5 deletions(-)

diff --git a/arch/x86/entry/vdso/vdso-layout.lds.S 
b/arch/x86/entry/vdso/vdso-layout.lds.S
index 0cedc905c8d6..93c6dc7812d0 100644
--- a/arch/x86/entry/vdso/vdso-layout.lds.S
+++ b/arch/x86/entry/vdso/vdso-layout.lds.S
@@ -65,11 +65,6 @@ SECTIONS
 
.text   : { *(.text*) } :text   =0x90909090,
 
-   /*
-* At the end so that eu-elflint stays happy when vdso2c strips
-* these.  A better implementation would avoid allocating space
-* for these.
-*/
.altinstructions: { *(.altinstructions) }   :text
.altinstr_replacement   : { *(.altinstr_replacement) }  :text
 


[tip:x86/mm] generic/pgtable: Introduce {p4d,pgd}_same()

2018-12-05 Thread tip-bot for Dan Williams
Commit-ID:  0cebbb60f759a709dabb3c87b9704f9844878850
Gitweb: https://git.kernel.org/tip/0cebbb60f759a709dabb3c87b9704f9844878850
Author: Dan Williams 
AuthorDate: Tue, 4 Dec 2018 13:37:11 -0800
Committer:  Ingo Molnar 
CommitDate: Wed, 5 Dec 2018 09:03:06 +0100

generic/pgtable: Introduce {p4d,pgd}_same()

In preparation for introducing '_safe' versions of page table entry 'set'
helpers, introduce generic versions of p4d_same() and pgd_same().

Signed-off-by: Dan Williams 
Acked-by: Kirill A. Shutemov 
Cc: Andy Lutomirski 
Cc: Borislav Petkov 
Cc: Dave Hansen 
Cc: H. Peter Anvin 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Rik van Riel 
Cc: Sebastian Andrzej Siewior 
Cc: Thomas Gleixner 
Link: 
http://lkml.kernel.org/r/154395943153.32119.1733586547359626311.st...@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Ingo Molnar 
---
 include/asm-generic/pgtable.h | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
index eea50ef8b8cd..dae7f98babed 100644
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -386,6 +386,20 @@ static inline int pud_same(pud_t pud_a, pud_t pud_b)
 }
 #endif
 
+#ifndef __HAVE_ARCH_P4D_SAME
+static inline int p4d_same(p4d_t p4d_a, p4d_t p4d_b)
+{
+   return p4d_val(p4d_a) == p4d_val(p4d_b);
+}
+#endif
+
+#ifndef __HAVE_ARCH_PGD_SAME
+static inline int pgd_same(pgd_t pgd_a, pgd_t pgd_b)
+{
+   return pgd_val(pgd_a) == pgd_val(pgd_b);
+}
+#endif
+
 #ifndef __HAVE_ARCH_DO_SWAP_PAGE
 /*
  * Some architectures support metadata associated with a page. When a


[tip:x86/mm] generic/pgtable: Introduce {p4d,pgd}_same()

2018-12-05 Thread tip-bot for Dan Williams
Commit-ID:  0cebbb60f759a709dabb3c87b9704f9844878850
Gitweb: https://git.kernel.org/tip/0cebbb60f759a709dabb3c87b9704f9844878850
Author: Dan Williams 
AuthorDate: Tue, 4 Dec 2018 13:37:11 -0800
Committer:  Ingo Molnar 
CommitDate: Wed, 5 Dec 2018 09:03:06 +0100

generic/pgtable: Introduce {p4d,pgd}_same()

In preparation for introducing '_safe' versions of page table entry 'set'
helpers, introduce generic versions of p4d_same() and pgd_same().

Signed-off-by: Dan Williams 
Acked-by: Kirill A. Shutemov 
Cc: Andy Lutomirski 
Cc: Borislav Petkov 
Cc: Dave Hansen 
Cc: H. Peter Anvin 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Rik van Riel 
Cc: Sebastian Andrzej Siewior 
Cc: Thomas Gleixner 
Link: 
http://lkml.kernel.org/r/154395943153.32119.1733586547359626311.st...@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Ingo Molnar 
---
 include/asm-generic/pgtable.h | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
index eea50ef8b8cd..dae7f98babed 100644
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -386,6 +386,20 @@ static inline int pud_same(pud_t pud_a, pud_t pud_b)
 }
 #endif
 
+#ifndef __HAVE_ARCH_P4D_SAME
+static inline int p4d_same(p4d_t p4d_a, p4d_t p4d_b)
+{
+   return p4d_val(p4d_a) == p4d_val(p4d_b);
+}
+#endif
+
+#ifndef __HAVE_ARCH_PGD_SAME
+static inline int pgd_same(pgd_t pgd_a, pgd_t pgd_b)
+{
+   return pgd_val(pgd_a) == pgd_val(pgd_b);
+}
+#endif
+
 #ifndef __HAVE_ARCH_DO_SWAP_PAGE
 /*
  * Some architectures support metadata associated with a page. When a


[tip:x86/mm] generic/pgtable: Make {pmd, pud}_same() unconditionally available

2018-12-05 Thread tip-bot for Dan Williams
Commit-ID:  c683c37cd13246941924c48f6c6a9863425e0cec
Gitweb: https://git.kernel.org/tip/c683c37cd13246941924c48f6c6a9863425e0cec
Author: Dan Williams 
AuthorDate: Tue, 4 Dec 2018 13:37:06 -0800
Committer:  Ingo Molnar 
CommitDate: Wed, 5 Dec 2018 09:03:05 +0100

generic/pgtable: Make {pmd, pud}_same() unconditionally available

In preparation for {pmd,pud}_same() to be used outside of transparent
huge page code paths, make them unconditionally available. This enables
them to be used in the definition of a new family of
set_{pte,pmd,pud,p4d,pgd}_safe() helpers.

Signed-off-by: Dan Williams 
Acked-by: Kirill A. Shutemov 
Cc: Andy Lutomirski 
Cc: Borislav Petkov 
Cc: Dave Hansen 
Cc: H. Peter Anvin 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Rik van Riel 
Cc: Sebastian Andrzej Siewior 
Cc: Thomas Gleixner 
Link: 
http://lkml.kernel.org/r/154395942644.32119.10238934183949418128.st...@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Ingo Molnar 
---
 include/asm-generic/pgtable.h | 14 --
 1 file changed, 14 deletions(-)

diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
index 359fb935ded6..eea50ef8b8cd 100644
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -375,7 +375,6 @@ static inline int pte_unused(pte_t pte)
 #endif
 
 #ifndef __HAVE_ARCH_PMD_SAME
-#ifdef CONFIG_TRANSPARENT_HUGEPAGE
 static inline int pmd_same(pmd_t pmd_a, pmd_t pmd_b)
 {
return pmd_val(pmd_a) == pmd_val(pmd_b);
@@ -385,19 +384,6 @@ static inline int pud_same(pud_t pud_a, pud_t pud_b)
 {
return pud_val(pud_a) == pud_val(pud_b);
 }
-#else /* CONFIG_TRANSPARENT_HUGEPAGE */
-static inline int pmd_same(pmd_t pmd_a, pmd_t pmd_b)
-{
-   BUILD_BUG();
-   return 0;
-}
-
-static inline int pud_same(pud_t pud_a, pud_t pud_b)
-{
-   BUILD_BUG();
-   return 0;
-}
-#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
 #endif
 
 #ifndef __HAVE_ARCH_DO_SWAP_PAGE


[tip:x86/mm] generic/pgtable: Make {pmd, pud}_same() unconditionally available

2018-12-05 Thread tip-bot for Dan Williams
Commit-ID:  c683c37cd13246941924c48f6c6a9863425e0cec
Gitweb: https://git.kernel.org/tip/c683c37cd13246941924c48f6c6a9863425e0cec
Author: Dan Williams 
AuthorDate: Tue, 4 Dec 2018 13:37:06 -0800
Committer:  Ingo Molnar 
CommitDate: Wed, 5 Dec 2018 09:03:05 +0100

generic/pgtable: Make {pmd, pud}_same() unconditionally available

In preparation for {pmd,pud}_same() to be used outside of transparent
huge page code paths, make them unconditionally available. This enables
them to be used in the definition of a new family of
set_{pte,pmd,pud,p4d,pgd}_safe() helpers.

Signed-off-by: Dan Williams 
Acked-by: Kirill A. Shutemov 
Cc: Andy Lutomirski 
Cc: Borislav Petkov 
Cc: Dave Hansen 
Cc: H. Peter Anvin 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Rik van Riel 
Cc: Sebastian Andrzej Siewior 
Cc: Thomas Gleixner 
Link: 
http://lkml.kernel.org/r/154395942644.32119.10238934183949418128.st...@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Ingo Molnar 
---
 include/asm-generic/pgtable.h | 14 --
 1 file changed, 14 deletions(-)

diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
index 359fb935ded6..eea50ef8b8cd 100644
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -375,7 +375,6 @@ static inline int pte_unused(pte_t pte)
 #endif
 
 #ifndef __HAVE_ARCH_PMD_SAME
-#ifdef CONFIG_TRANSPARENT_HUGEPAGE
 static inline int pmd_same(pmd_t pmd_a, pmd_t pmd_b)
 {
return pmd_val(pmd_a) == pmd_val(pmd_b);
@@ -385,19 +384,6 @@ static inline int pud_same(pud_t pud_a, pud_t pud_b)
 {
return pud_val(pud_a) == pud_val(pud_b);
 }
-#else /* CONFIG_TRANSPARENT_HUGEPAGE */
-static inline int pmd_same(pmd_t pmd_a, pmd_t pmd_b)
-{
-   BUILD_BUG();
-   return 0;
-}
-
-static inline int pud_same(pud_t pud_a, pud_t pud_b)
-{
-   BUILD_BUG();
-   return 0;
-}
-#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
 #endif
 
 #ifndef __HAVE_ARCH_DO_SWAP_PAGE


[tip:x86/asm] x86/vdso: Remove obsolete "fake section table" reservation

2018-12-05 Thread tip-bot for Sean Christopherson
Commit-ID:  24b7c77bbb24f129beead93574ff248c3db1288e
Gitweb: https://git.kernel.org/tip/24b7c77bbb24f129beead93574ff248c3db1288e
Author: Sean Christopherson 
AuthorDate: Tue, 4 Dec 2018 13:25:57 -0800
Committer:  Ingo Molnar 
CommitDate: Wed, 5 Dec 2018 08:58:11 +0100

x86/vdso: Remove obsolete "fake section table" reservation

At one point the vDSO image was manually stripped down by vdso2c in an
attempt to minimize the size of the image mapped into userspace.  Part
of that stripping process involved building a fake section table so as
not to break userspace processes that parse the section table.  Memory
for the fake section table was reserved in the .rodata section so that
vdso2c could simply copy the entire PT_LOAD segment into the userspace
image after building the fake table.

Eventually, the entire fake section table approach was dropped in favor
of stripping the vdso "the old fashioned way", i.e. via objdump -S.
But, the reservation in .rodata for the fake table was left behind.
Remove the reserveration along with a few other related defines and
section entries.

Removing the fake section table placeholder zaps a whopping 0x340 bytes
from the 64-bit vDSO image, which drops the current image's size to
under 4k, i.e. reduces the effective size of the userspace vDSO mapping
by a full page.

Signed-off-by: Sean Christopherson 
Acked-by: Andy Lutomirski 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Dave Hansen 
Cc: Denys Vlasenko 
Cc: H. Peter Anvin 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Rik van Riel 
Cc: Thomas Gleixner 
Fixes: da861e18eccc ("x86, vdso: Get rid of the fake section mechanism")
Link: 
http://lkml.kernel.org/r/20181204212600.28090-2-sean.j.christopher...@intel.com
Signed-off-by: Ingo Molnar 
---
 arch/x86/entry/vdso/vdso-layout.lds.S | 22 --
 arch/x86/entry/vdso/vdso2c.c  |  8 
 2 files changed, 30 deletions(-)

diff --git a/arch/x86/entry/vdso/vdso-layout.lds.S 
b/arch/x86/entry/vdso/vdso-layout.lds.S
index acfd5ba7d943..0cedc905c8d6 100644
--- a/arch/x86/entry/vdso/vdso-layout.lds.S
+++ b/arch/x86/entry/vdso/vdso-layout.lds.S
@@ -7,16 +7,6 @@
  * This script controls its layout.
  */
 
-#if defined(BUILD_VDSO64)
-# define SHDR_SIZE 64
-#elif defined(BUILD_VDSO32) || defined(BUILD_VDSOX32)
-# define SHDR_SIZE 40
-#else
-# error unknown VDSO target
-#endif
-
-#define NUM_FAKE_SHDRS 13
-
 SECTIONS
 {
/*
@@ -60,20 +50,8 @@ SECTIONS
*(.bss*)
*(.dynbss*)
*(.gnu.linkonce.b.*)
-
-   /*
-* Ideally this would live in a C file, but that won't
-* work cleanly for x32 until we start building the x32
-* C code using an x32 toolchain.
-*/
-   VDSO_FAKE_SECTION_TABLE_START = .;
-   . = . + NUM_FAKE_SHDRS * SHDR_SIZE;
-   VDSO_FAKE_SECTION_TABLE_END = .;
}   :text
 
-   .fake_shstrtab  : { *(.fake_shstrtab) } :text
-
-
.note   : { *(.note.*) }:text   :note
 
.eh_frame_hdr   : { *(.eh_frame_hdr) }  :text   :eh_frame_hdr
diff --git a/arch/x86/entry/vdso/vdso2c.c b/arch/x86/entry/vdso/vdso2c.c
index 4674f58581a1..8e470b018512 100644
--- a/arch/x86/entry/vdso/vdso2c.c
+++ b/arch/x86/entry/vdso/vdso2c.c
@@ -76,8 +76,6 @@ enum {
sym_hpet_page,
sym_pvclock_page,
sym_hvclock_page,
-   sym_VDSO_FAKE_SECTION_TABLE_START,
-   sym_VDSO_FAKE_SECTION_TABLE_END,
 };
 
 const int special_pages[] = {
@@ -98,12 +96,6 @@ struct vdso_sym required_syms[] = {
[sym_hpet_page] = {"hpet_page", true},
[sym_pvclock_page] = {"pvclock_page", true},
[sym_hvclock_page] = {"hvclock_page", true},
-   [sym_VDSO_FAKE_SECTION_TABLE_START] = {
-   "VDSO_FAKE_SECTION_TABLE_START", false
-   },
-   [sym_VDSO_FAKE_SECTION_TABLE_END] = {
-   "VDSO_FAKE_SECTION_TABLE_END", false
-   },
{"VDSO32_NOTE_MASK", true},
{"__kernel_vsyscall", true},
{"__kernel_sigreturn", true},


[tip:x86/asm] x86/vdso: Remove obsolete "fake section table" reservation

2018-12-05 Thread tip-bot for Sean Christopherson
Commit-ID:  24b7c77bbb24f129beead93574ff248c3db1288e
Gitweb: https://git.kernel.org/tip/24b7c77bbb24f129beead93574ff248c3db1288e
Author: Sean Christopherson 
AuthorDate: Tue, 4 Dec 2018 13:25:57 -0800
Committer:  Ingo Molnar 
CommitDate: Wed, 5 Dec 2018 08:58:11 +0100

x86/vdso: Remove obsolete "fake section table" reservation

At one point the vDSO image was manually stripped down by vdso2c in an
attempt to minimize the size of the image mapped into userspace.  Part
of that stripping process involved building a fake section table so as
not to break userspace processes that parse the section table.  Memory
for the fake section table was reserved in the .rodata section so that
vdso2c could simply copy the entire PT_LOAD segment into the userspace
image after building the fake table.

Eventually, the entire fake section table approach was dropped in favor
of stripping the vdso "the old fashioned way", i.e. via objdump -S.
But, the reservation in .rodata for the fake table was left behind.
Remove the reserveration along with a few other related defines and
section entries.

Removing the fake section table placeholder zaps a whopping 0x340 bytes
from the 64-bit vDSO image, which drops the current image's size to
under 4k, i.e. reduces the effective size of the userspace vDSO mapping
by a full page.

Signed-off-by: Sean Christopherson 
Acked-by: Andy Lutomirski 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Dave Hansen 
Cc: Denys Vlasenko 
Cc: H. Peter Anvin 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Rik van Riel 
Cc: Thomas Gleixner 
Fixes: da861e18eccc ("x86, vdso: Get rid of the fake section mechanism")
Link: 
http://lkml.kernel.org/r/20181204212600.28090-2-sean.j.christopher...@intel.com
Signed-off-by: Ingo Molnar 
---
 arch/x86/entry/vdso/vdso-layout.lds.S | 22 --
 arch/x86/entry/vdso/vdso2c.c  |  8 
 2 files changed, 30 deletions(-)

diff --git a/arch/x86/entry/vdso/vdso-layout.lds.S 
b/arch/x86/entry/vdso/vdso-layout.lds.S
index acfd5ba7d943..0cedc905c8d6 100644
--- a/arch/x86/entry/vdso/vdso-layout.lds.S
+++ b/arch/x86/entry/vdso/vdso-layout.lds.S
@@ -7,16 +7,6 @@
  * This script controls its layout.
  */
 
-#if defined(BUILD_VDSO64)
-# define SHDR_SIZE 64
-#elif defined(BUILD_VDSO32) || defined(BUILD_VDSOX32)
-# define SHDR_SIZE 40
-#else
-# error unknown VDSO target
-#endif
-
-#define NUM_FAKE_SHDRS 13
-
 SECTIONS
 {
/*
@@ -60,20 +50,8 @@ SECTIONS
*(.bss*)
*(.dynbss*)
*(.gnu.linkonce.b.*)
-
-   /*
-* Ideally this would live in a C file, but that won't
-* work cleanly for x32 until we start building the x32
-* C code using an x32 toolchain.
-*/
-   VDSO_FAKE_SECTION_TABLE_START = .;
-   . = . + NUM_FAKE_SHDRS * SHDR_SIZE;
-   VDSO_FAKE_SECTION_TABLE_END = .;
}   :text
 
-   .fake_shstrtab  : { *(.fake_shstrtab) } :text
-
-
.note   : { *(.note.*) }:text   :note
 
.eh_frame_hdr   : { *(.eh_frame_hdr) }  :text   :eh_frame_hdr
diff --git a/arch/x86/entry/vdso/vdso2c.c b/arch/x86/entry/vdso/vdso2c.c
index 4674f58581a1..8e470b018512 100644
--- a/arch/x86/entry/vdso/vdso2c.c
+++ b/arch/x86/entry/vdso/vdso2c.c
@@ -76,8 +76,6 @@ enum {
sym_hpet_page,
sym_pvclock_page,
sym_hvclock_page,
-   sym_VDSO_FAKE_SECTION_TABLE_START,
-   sym_VDSO_FAKE_SECTION_TABLE_END,
 };
 
 const int special_pages[] = {
@@ -98,12 +96,6 @@ struct vdso_sym required_syms[] = {
[sym_hpet_page] = {"hpet_page", true},
[sym_pvclock_page] = {"pvclock_page", true},
[sym_hvclock_page] = {"hvclock_page", true},
-   [sym_VDSO_FAKE_SECTION_TABLE_START] = {
-   "VDSO_FAKE_SECTION_TABLE_START", false
-   },
-   [sym_VDSO_FAKE_SECTION_TABLE_END] = {
-   "VDSO_FAKE_SECTION_TABLE_END", false
-   },
{"VDSO32_NOTE_MASK", true},
{"__kernel_vsyscall", true},
{"__kernel_sigreturn", true},


Re: [PATCH v1 5/5] perf cs-etm: Track exception number

2018-12-05 Thread Mathieu Poirier
On Tue, 4 Dec 2018 at 20:49,  wrote:
>
> On Mon, Nov 19, 2018 at 01:47:49PM -0700, Mathieu Poirier wrote:
> > On Sun, Nov 11, 2018 at 12:59:43PM +0800, Leo Yan wrote:
> > > When an exception packet comes, it contains the info for exception
> > > number; the exception number indicates the exception types, so from it
> > > we can know if the exception is taken for interrupt, system call or
> > > other traps, etc.  But because the exception return packet cannot
> > > delivery exception number correctly by decoder thus when prepare sample
> > > flags we cannot know what's type for exception return.
> > >
> > > This patch adds a new 'exc_num' array in decoder structure to record
> > > exception number per CPU, the exception number is recorded in the array
> > > when the exception packet comes and this exception number can be used by
> > > exception return packet.  If detect there have discontinuous trace with
> > > TRACE_ON or TRACE_OFF packet, the exception number is set to invalid
> > > value.
> > >
> > > Signed-off-by: Leo Yan 
> > > ---
> > >  tools/perf/util/cs-etm-decoder/cs-etm-decoder.c | 67 
> > > ++---
> > >  1 file changed, 59 insertions(+), 8 deletions(-)
> > >
> > > diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c 
> > > b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
> > > index b8cb7a3e..d1a6cbc 100644
> > > --- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
> > > +++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
> > > @@ -43,6 +43,7 @@ struct cs_etm_decoder {
> > > u32 packet_count;
> > > u32 head;
> > > u32 tail;
> > > +   u32 *exc_num;
> > > struct cs_etm_packet packet_buffer[MAX_BUFFER];
> > >  };
> > >
> > > @@ -368,24 +369,64 @@ static ocsd_datapath_resp_t
> > >  cs_etm_decoder__buffer_trace_off(struct cs_etm_decoder *decoder,
> > >  const uint8_t trace_chan_id)
> > >  {
> > > -   return cs_etm_decoder__buffer_packet(decoder, trace_chan_id,
> > > -CS_ETM_TRACE_OFF);
> > > +   int ret;
> > > +   struct cs_etm_packet *packet;
> > > +
> > > +   ret = cs_etm_decoder__buffer_packet(decoder, trace_chan_id,
> > > +   CS_ETM_TRACE_OFF);
> > > +   if (ret != OCSD_RESP_CONT && ret != OCSD_RESP_WAIT)
> > > +   return ret;
> > > +
> > > +   packet = >packet_buffer[decoder->tail];
> > > +
> > > +   /* Clear execption number for discontinuous trace */
> > > +   decoder->exc_num[packet->cpu] = UINT32_MAX;
> > > +
> > > +   return ret;
> > >  }
> > >
> > >  static ocsd_datapath_resp_t
> > >  cs_etm_decoder__buffer_trace_on(struct cs_etm_decoder *decoder,
> > > const uint8_t trace_chan_id)
> > >  {
> > > -   return cs_etm_decoder__buffer_packet(decoder, trace_chan_id,
> > > -CS_ETM_TRACE_ON);
> > > +   int ret;
> > > +   struct cs_etm_packet *packet;
> > > +
> > > +   ret = cs_etm_decoder__buffer_packet(decoder, trace_chan_id,
> > > +   CS_ETM_TRACE_ON);
> > > +   if (ret != OCSD_RESP_CONT && ret != OCSD_RESP_WAIT)
> > > +   return ret;
> > > +
> > > +   packet = >packet_buffer[decoder->tail];
> > > +
> > > +   /* Clear execption number for discontinuous trace */
> > > +   decoder->exc_num[packet->cpu] = UINT32_MAX;
> > > +
> > > +   return ret;
> > >  }
> > >
> > >  static ocsd_datapath_resp_t
> > >  cs_etm_decoder__buffer_exception(struct cs_etm_decoder *decoder,
> > > +const ocsd_generic_trace_elem *elem,
> > >  const uint8_t trace_chan_id)
> > >  {
> > > -   return cs_etm_decoder__buffer_packet(decoder, trace_chan_id,
> > > -CS_ETM_EXCEPTION);
> > > +   int ret;
> > > +   struct cs_etm_packet *packet;
> > > +
> > > +   ret = cs_etm_decoder__buffer_packet(decoder, trace_chan_id,
> > > +   CS_ETM_EXCEPTION);
> > > +   if (ret != OCSD_RESP_CONT && ret != OCSD_RESP_WAIT)
> > > +   return ret;
> > > +
> > > +   packet = >packet_buffer[decoder->tail];
> > > +
> > > +   /*
> > > +* Exception number is recorded per CPU and later can be used
> > > +* for exception return instruction analysis.
> > > +*/
> > > +   decoder->exc_num[packet->cpu] = elem->exception_number;
> >
> > Am I missing something or the information about the exception number that is
> > recorded here isn't used anywhere?
>
> The exception number will be used to set branch flag patch [1].

Right, I realised that when I started reviewing that set.  The rule of
thumb here is to introduce code in the same patchset it is used so
that we avoid adding needless code to the kernel.

> According to exception number we can know it's for system call,
> interrupt or other traps.
>
> [1] 
> http://archive.armlinux.org.uk/lurker/message/2018.050755.d1c1b257.en.html
>
> > If you want to use this in perf report/script,
> > the 

Re: [PATCH v1 5/5] perf cs-etm: Track exception number

2018-12-05 Thread Mathieu Poirier
On Tue, 4 Dec 2018 at 20:49,  wrote:
>
> On Mon, Nov 19, 2018 at 01:47:49PM -0700, Mathieu Poirier wrote:
> > On Sun, Nov 11, 2018 at 12:59:43PM +0800, Leo Yan wrote:
> > > When an exception packet comes, it contains the info for exception
> > > number; the exception number indicates the exception types, so from it
> > > we can know if the exception is taken for interrupt, system call or
> > > other traps, etc.  But because the exception return packet cannot
> > > delivery exception number correctly by decoder thus when prepare sample
> > > flags we cannot know what's type for exception return.
> > >
> > > This patch adds a new 'exc_num' array in decoder structure to record
> > > exception number per CPU, the exception number is recorded in the array
> > > when the exception packet comes and this exception number can be used by
> > > exception return packet.  If detect there have discontinuous trace with
> > > TRACE_ON or TRACE_OFF packet, the exception number is set to invalid
> > > value.
> > >
> > > Signed-off-by: Leo Yan 
> > > ---
> > >  tools/perf/util/cs-etm-decoder/cs-etm-decoder.c | 67 
> > > ++---
> > >  1 file changed, 59 insertions(+), 8 deletions(-)
> > >
> > > diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c 
> > > b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
> > > index b8cb7a3e..d1a6cbc 100644
> > > --- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
> > > +++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
> > > @@ -43,6 +43,7 @@ struct cs_etm_decoder {
> > > u32 packet_count;
> > > u32 head;
> > > u32 tail;
> > > +   u32 *exc_num;
> > > struct cs_etm_packet packet_buffer[MAX_BUFFER];
> > >  };
> > >
> > > @@ -368,24 +369,64 @@ static ocsd_datapath_resp_t
> > >  cs_etm_decoder__buffer_trace_off(struct cs_etm_decoder *decoder,
> > >  const uint8_t trace_chan_id)
> > >  {
> > > -   return cs_etm_decoder__buffer_packet(decoder, trace_chan_id,
> > > -CS_ETM_TRACE_OFF);
> > > +   int ret;
> > > +   struct cs_etm_packet *packet;
> > > +
> > > +   ret = cs_etm_decoder__buffer_packet(decoder, trace_chan_id,
> > > +   CS_ETM_TRACE_OFF);
> > > +   if (ret != OCSD_RESP_CONT && ret != OCSD_RESP_WAIT)
> > > +   return ret;
> > > +
> > > +   packet = >packet_buffer[decoder->tail];
> > > +
> > > +   /* Clear execption number for discontinuous trace */
> > > +   decoder->exc_num[packet->cpu] = UINT32_MAX;
> > > +
> > > +   return ret;
> > >  }
> > >
> > >  static ocsd_datapath_resp_t
> > >  cs_etm_decoder__buffer_trace_on(struct cs_etm_decoder *decoder,
> > > const uint8_t trace_chan_id)
> > >  {
> > > -   return cs_etm_decoder__buffer_packet(decoder, trace_chan_id,
> > > -CS_ETM_TRACE_ON);
> > > +   int ret;
> > > +   struct cs_etm_packet *packet;
> > > +
> > > +   ret = cs_etm_decoder__buffer_packet(decoder, trace_chan_id,
> > > +   CS_ETM_TRACE_ON);
> > > +   if (ret != OCSD_RESP_CONT && ret != OCSD_RESP_WAIT)
> > > +   return ret;
> > > +
> > > +   packet = >packet_buffer[decoder->tail];
> > > +
> > > +   /* Clear execption number for discontinuous trace */
> > > +   decoder->exc_num[packet->cpu] = UINT32_MAX;
> > > +
> > > +   return ret;
> > >  }
> > >
> > >  static ocsd_datapath_resp_t
> > >  cs_etm_decoder__buffer_exception(struct cs_etm_decoder *decoder,
> > > +const ocsd_generic_trace_elem *elem,
> > >  const uint8_t trace_chan_id)
> > >  {
> > > -   return cs_etm_decoder__buffer_packet(decoder, trace_chan_id,
> > > -CS_ETM_EXCEPTION);
> > > +   int ret;
> > > +   struct cs_etm_packet *packet;
> > > +
> > > +   ret = cs_etm_decoder__buffer_packet(decoder, trace_chan_id,
> > > +   CS_ETM_EXCEPTION);
> > > +   if (ret != OCSD_RESP_CONT && ret != OCSD_RESP_WAIT)
> > > +   return ret;
> > > +
> > > +   packet = >packet_buffer[decoder->tail];
> > > +
> > > +   /*
> > > +* Exception number is recorded per CPU and later can be used
> > > +* for exception return instruction analysis.
> > > +*/
> > > +   decoder->exc_num[packet->cpu] = elem->exception_number;
> >
> > Am I missing something or the information about the exception number that is
> > recorded here isn't used anywhere?
>
> The exception number will be used to set branch flag patch [1].

Right, I realised that when I started reviewing that set.  The rule of
thumb here is to introduce code in the same patchset it is used so
that we avoid adding needless code to the kernel.

> According to exception number we can know it's for system call,
> interrupt or other traps.
>
> [1] 
> http://archive.armlinux.org.uk/lurker/message/2018.050755.d1c1b257.en.html
>
> > If you want to use this in perf report/script,
> > the 

Re: [PATCH] x86/mce: Streamline MCE subsystem's naming

2018-12-05 Thread Ingo Molnar


* Borislav Petkov  wrote:

> On Wed, Dec 05, 2018 at 05:30:37PM +0100, Ingo Molnar wrote:
> > Would it make sense to organize it a bit more and separate out vendor 
> > specific functionality:
> > 
> >   mce/cpu/intel.c
> >   mce/cpu/intel-p5.c
> >   mce/cpu/amd.c
> >   mce/cpu/winchip.c
> 
> That's too fine-grained IMO and look at the path we'd get then:
> 
> arch/x86/kernel/cpu/mce/cpu/intel.c
>   ^^^ ^^^

Oh - I thought we'd have arch/x86/mce/ or so?

There's machine check events that are not tied to any particular CPU, 
correct? So this would be the right conceptual level - and it would also 
remove the somewhat redundant 'kernel' part.

Thanks,

Ingo


Re: [PATCH] x86/mce: Streamline MCE subsystem's naming

2018-12-05 Thread Ingo Molnar


* Borislav Petkov  wrote:

> On Wed, Dec 05, 2018 at 05:30:37PM +0100, Ingo Molnar wrote:
> > Would it make sense to organize it a bit more and separate out vendor 
> > specific functionality:
> > 
> >   mce/cpu/intel.c
> >   mce/cpu/intel-p5.c
> >   mce/cpu/amd.c
> >   mce/cpu/winchip.c
> 
> That's too fine-grained IMO and look at the path we'd get then:
> 
> arch/x86/kernel/cpu/mce/cpu/intel.c
>   ^^^ ^^^

Oh - I thought we'd have arch/x86/mce/ or so?

There's machine check events that are not tied to any particular CPU, 
correct? So this would be the right conceptual level - and it would also 
remove the somewhat redundant 'kernel' part.

Thanks,

Ingo


Re: [RFC PATCH 02/14] mm/hms: heterogenenous memory system (HMS) documentation

2018-12-05 Thread Jerome Glisse
On Wed, Dec 05, 2018 at 10:25:31AM -0700, Logan Gunthorpe wrote:
> 
> 
> On 2018-12-04 7:37 p.m., Jerome Glisse wrote:
> >>
> >> This came up before for apis even better defined than HMS as well as
> >> more limited scope, i.e. experimental ABI availability only for -rc
> >> kernels. Linus said this:
> >>
> >> "There are no loopholes. No "but it's been only one release". No, no,
> >> no. The whole point is that users are supposed to be able to *trust*
> >> the kernel. If we do something, we keep on doing it.
> >>
> >> And if it makes it harder to add new user-visible interfaces, then
> >> that's a *good* thing." [1]
> >>
> >> The takeaway being don't land work-in-progress ABIs in the kernel.
> >> Once an application depends on it, there are no more incompatible
> >> changes possible regardless of the warnings, experimental notices, or
> >> "staging" designation. DAX is experimental because there are cases
> >> where it currently does not work with respect to another kernel
> >> feature like xfs-reflink, RDMA. The plan is to fix those, not continue
> >> to hide behind an experimental designation, and fix them in a way that
> >> preserves the user visible behavior that has already been exposed,
> >> i.e. no regressions.
> >>
> >> [1]: 
> >> https://lists.linuxfoundation.org/pipermail/ksummit-discuss/2017-August/004742.html
> > 
> > So i guess i am heading down the vXX road ... such is my life :)
> 
> I recommend against it. I really haven't been convinced by any of your
> arguments for having a second topology tree. The existing topology tree
> in sysfs already better describes the links between hardware right now,
> except for the missing GPU links (and those should be addressable within
> the GPU community). Plus, maybe, some other enhancements to sockets/numa
> node descriptions if there's something missing there.
> 
> Then, 'hbind' is another issue but I suspect it would be better
> implemented as an ioctl on existing GPU interfaces. I certainly can't
> see any benefit in using it myself.
> 
> It's better to take an approach that would be less controversial with
> the community than to brow beat them with a patch set 20+ times until
> they take it.

So here is what i am gonna do because i need this code now. I am gonna
split the helper code that does policy and hbind out from its sysfs
peerage and i am gonna turn it into helpers that each device driver
can use. I will move the sysfs and syscall to be a patchset on its own
which use the exact same above infrastructure.

This means that i am loosing feature as it means that userspace can
not provide a list of multiple device memory to use (which is much more
common that you might think) but at least i can provide something for
the single device case through ioctl.

I am not giving up on sysfs or syscall as this is needed long term so
i am gonna improve it, port existing userspace (OpenCL, ROCm, ...) to
use it (in branch) and demonstrate how it get use by end application.
I will beat it again and again until either i convince people through
hard evidence or i get bored. I do not get bored easily :)

Cheers,
Jérôme


Re: [RFC PATCH 02/14] mm/hms: heterogenenous memory system (HMS) documentation

2018-12-05 Thread Jerome Glisse
On Wed, Dec 05, 2018 at 10:25:31AM -0700, Logan Gunthorpe wrote:
> 
> 
> On 2018-12-04 7:37 p.m., Jerome Glisse wrote:
> >>
> >> This came up before for apis even better defined than HMS as well as
> >> more limited scope, i.e. experimental ABI availability only for -rc
> >> kernels. Linus said this:
> >>
> >> "There are no loopholes. No "but it's been only one release". No, no,
> >> no. The whole point is that users are supposed to be able to *trust*
> >> the kernel. If we do something, we keep on doing it.
> >>
> >> And if it makes it harder to add new user-visible interfaces, then
> >> that's a *good* thing." [1]
> >>
> >> The takeaway being don't land work-in-progress ABIs in the kernel.
> >> Once an application depends on it, there are no more incompatible
> >> changes possible regardless of the warnings, experimental notices, or
> >> "staging" designation. DAX is experimental because there are cases
> >> where it currently does not work with respect to another kernel
> >> feature like xfs-reflink, RDMA. The plan is to fix those, not continue
> >> to hide behind an experimental designation, and fix them in a way that
> >> preserves the user visible behavior that has already been exposed,
> >> i.e. no regressions.
> >>
> >> [1]: 
> >> https://lists.linuxfoundation.org/pipermail/ksummit-discuss/2017-August/004742.html
> > 
> > So i guess i am heading down the vXX road ... such is my life :)
> 
> I recommend against it. I really haven't been convinced by any of your
> arguments for having a second topology tree. The existing topology tree
> in sysfs already better describes the links between hardware right now,
> except for the missing GPU links (and those should be addressable within
> the GPU community). Plus, maybe, some other enhancements to sockets/numa
> node descriptions if there's something missing there.
> 
> Then, 'hbind' is another issue but I suspect it would be better
> implemented as an ioctl on existing GPU interfaces. I certainly can't
> see any benefit in using it myself.
> 
> It's better to take an approach that would be less controversial with
> the community than to brow beat them with a patch set 20+ times until
> they take it.

So here is what i am gonna do because i need this code now. I am gonna
split the helper code that does policy and hbind out from its sysfs
peerage and i am gonna turn it into helpers that each device driver
can use. I will move the sysfs and syscall to be a patchset on its own
which use the exact same above infrastructure.

This means that i am loosing feature as it means that userspace can
not provide a list of multiple device memory to use (which is much more
common that you might think) but at least i can provide something for
the single device case through ioctl.

I am not giving up on sysfs or syscall as this is needed long term so
i am gonna improve it, port existing userspace (OpenCL, ROCm, ...) to
use it (in branch) and demonstrate how it get use by end application.
I will beat it again and again until either i convince people through
hard evidence or i get bored. I do not get bored easily :)

Cheers,
Jérôme


Re: [PATCH v6 0/8] AXP8x3 AC and battery power supply support

2018-12-05 Thread Sebastian Reichel
Hi,

On Fri, Nov 30, 2018 at 04:31:28PM +0800, Chen-Yu Tsai wrote:
> On Wed, Nov 21, 2018 at 1:52 AM Oskari Lemmela  wrote:
> >
> > AXP813 AC power supply support with input current and
> > voltage limiting support.
> >
> > AXP803 AC and battery power supply support.
> >
> > Changes in v6:
> > * Collected tags
> > * Rebase to master
> > * Dropped AXP803 compatible patches
> >
> > Changes in v5:
> > * Return correct input current limit for values 0x6 and 0x7
> > * Add specific compatibles for AXP803
> > * Change commit messages
> > * Add Vasily Khoruzhick Pinebook DTS patch
> >
> > Changes in v4:
> > * Change order of axp20x-gpio in axp20x.c
> > * Fix indentation and spaces to tabs in axp20x.c
> >
> > Changes in v3:
> > * Reorder ac_power_supply DT node
> > * Rename axp20x_ac_power_set_property function
> > * Split mfd commit
> >
> > Changes in v2:
> > * Reuse axp813 compatibles for axp803
> > * Refactor axp20x_ac_power.c
> >
> >
> > Oskari Lemmela (7):
> >   dt-bindings: power: supply: axp20x: add AXP813 AC power DT binding
> >   ARM: dts: axp81x: add AC power supply subnode
> >   arm64: dts: allwinner: axp803: add AC and battery power supplies
> >   arm64: dts: allwinner: a64: sopine-baseboard: enable power supplies
> >   power: supply: add AC power supply driver for AXP813
> >   mfd: axp20x: Add AC power supply cell for AXP813
> >   mfd: axp20x: Add supported cells for AXP803
> >
> > Vasily Khoruzhick (1):
> >   arm64: dts: allwinner: a64: pinebook: enable power supplies
> 
> Since this series has been fully reviewed, and the devices it
> adds/enables aren't critical, i.e. won't cause the system to fail
> if the drivers are missing, I've merged the dts patches for 4.21.
> 
> Hopefully Sebastian and Lee will get around to merging the driver
> patches.

I queued the power-supply driver changes and prepared a signed
immutable tag for Lee:

The following changes since commit 651022382c7f8da46cb4872a545ee1da6d097d2a:

  Linux 4.20-rc1 (2018-11-04 15:37:52 -0800)

are available in the Git repository at:

  
ssh://g...@gitolite.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply.git
 tags/psy-mfd-axp813-immutable-for-v4.21-signed

for you to fetch changes up to 7693b5643fd2d682de90733b67fc8032b9646911:

  power: supply: add AC power supply driver for AXP813 (2018-12-05 18:49:32 
+0100)


Immutable branch between mfd and power-supply for driver
changes related to axp813.

Signed-off-by: Sebastian Reichel 


Oskari Lemmela (2):
  dt-bindings: power: supply: axp20x: add AXP813 AC power DT binding
  power: supply: add AC power supply driver for AXP813

 .../bindings/power/supply/axp20x_ac_power.txt  |  3 +
 drivers/power/supply/axp20x_ac_power.c | 94 ++
 include/linux/mfd/axp20x.h |  1 +
 3 files changed, 98 insertions(+)


signature.asc
Description: PGP signature


Re: [PATCH v6 0/8] AXP8x3 AC and battery power supply support

2018-12-05 Thread Sebastian Reichel
Hi,

On Fri, Nov 30, 2018 at 04:31:28PM +0800, Chen-Yu Tsai wrote:
> On Wed, Nov 21, 2018 at 1:52 AM Oskari Lemmela  wrote:
> >
> > AXP813 AC power supply support with input current and
> > voltage limiting support.
> >
> > AXP803 AC and battery power supply support.
> >
> > Changes in v6:
> > * Collected tags
> > * Rebase to master
> > * Dropped AXP803 compatible patches
> >
> > Changes in v5:
> > * Return correct input current limit for values 0x6 and 0x7
> > * Add specific compatibles for AXP803
> > * Change commit messages
> > * Add Vasily Khoruzhick Pinebook DTS patch
> >
> > Changes in v4:
> > * Change order of axp20x-gpio in axp20x.c
> > * Fix indentation and spaces to tabs in axp20x.c
> >
> > Changes in v3:
> > * Reorder ac_power_supply DT node
> > * Rename axp20x_ac_power_set_property function
> > * Split mfd commit
> >
> > Changes in v2:
> > * Reuse axp813 compatibles for axp803
> > * Refactor axp20x_ac_power.c
> >
> >
> > Oskari Lemmela (7):
> >   dt-bindings: power: supply: axp20x: add AXP813 AC power DT binding
> >   ARM: dts: axp81x: add AC power supply subnode
> >   arm64: dts: allwinner: axp803: add AC and battery power supplies
> >   arm64: dts: allwinner: a64: sopine-baseboard: enable power supplies
> >   power: supply: add AC power supply driver for AXP813
> >   mfd: axp20x: Add AC power supply cell for AXP813
> >   mfd: axp20x: Add supported cells for AXP803
> >
> > Vasily Khoruzhick (1):
> >   arm64: dts: allwinner: a64: pinebook: enable power supplies
> 
> Since this series has been fully reviewed, and the devices it
> adds/enables aren't critical, i.e. won't cause the system to fail
> if the drivers are missing, I've merged the dts patches for 4.21.
> 
> Hopefully Sebastian and Lee will get around to merging the driver
> patches.

I queued the power-supply driver changes and prepared a signed
immutable tag for Lee:

The following changes since commit 651022382c7f8da46cb4872a545ee1da6d097d2a:

  Linux 4.20-rc1 (2018-11-04 15:37:52 -0800)

are available in the Git repository at:

  
ssh://g...@gitolite.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply.git
 tags/psy-mfd-axp813-immutable-for-v4.21-signed

for you to fetch changes up to 7693b5643fd2d682de90733b67fc8032b9646911:

  power: supply: add AC power supply driver for AXP813 (2018-12-05 18:49:32 
+0100)


Immutable branch between mfd and power-supply for driver
changes related to axp813.

Signed-off-by: Sebastian Reichel 


Oskari Lemmela (2):
  dt-bindings: power: supply: axp20x: add AXP813 AC power DT binding
  power: supply: add AC power supply driver for AXP813

 .../bindings/power/supply/axp20x_ac_power.txt  |  3 +
 drivers/power/supply/axp20x_ac_power.c | 94 ++
 include/linux/mfd/axp20x.h |  1 +
 3 files changed, 98 insertions(+)


signature.asc
Description: PGP signature


Re: edac driver injection of uncorrected errors & utils

2018-12-05 Thread York Sun
On 12/5/18 8:38 AM, Tracy Smith wrote:
> This is more directed toward York for layerscape. I see some edac code
> that seem to do periodic scrubs based on intervals or scrub rate, but
> that is not needed for the layerscape driver to correct errors because
> errors are scrubbed when found by the edac scrub block or is it
> because the memory controller itself does the correction/scrubbing
> when an error is found?

Single-bit errors are corrected by memory controller without involving
software.

York


Re: edac driver injection of uncorrected errors & utils

2018-12-05 Thread York Sun
On 12/5/18 8:38 AM, Tracy Smith wrote:
> This is more directed toward York for layerscape. I see some edac code
> that seem to do periodic scrubs based on intervals or scrub rate, but
> that is not needed for the layerscape driver to correct errors because
> errors are scrubbed when found by the edac scrub block or is it
> because the memory controller itself does the correction/scrubbing
> when an error is found?

Single-bit errors are corrected by memory controller without involving
software.

York


Re: [PATCH] auxdisplay: charlcd: fix x/y command parsing

2018-12-05 Thread Måns Rullgård
Miguel Ojeda  writes:

> Hi Mans,
>
> [CC'ing a few people involved previously on this]
>
> On Wed, Dec 5, 2018 at 2:53 PM Mans Rullgard  wrote:
>>
>> Commit b34050fadb86 ("auxdisplay: charlcd: Fix and clean up handling of
>> x/y commands") fixed some problems by rewriting the parsing code,
>> but also broke things further by removing the check for a complete
>> command before attempting to parse it.  As a result, parsing is
>> terminated at the first x or y character.
>
> Why is it necessary to check for ";" to be exactly at the end? Or, in
> other words, what requires it?
>
> If the syntax supported by parse_xy() is wrong for some reason, we
> should fix that (instead of adding a check before calling it).

Suppose the command "\e[Lx0y0;" is written to the device.  The
charlcd_write_char() function adds one character at a time to the escape
sequence buffer.  Once the 'L' has been seen, it calls
handle_lcd_special_code() after each additional character is added.  On
encountering the 'x' this function will attempt to parse the command
using parse_xy(), which fails since it is incomplete.  It is nonetheless
reported as processed, and the escape sequence is flushed.  The
remainder of the command (i.e. "0y0;") is then displayed as text.

Since parse_xy() can never return success (true) unless there is a
semicolon, it is pointless to call it until we have seen one.  With the
characters being added to the buffer one by one, it is enough to check
for semicolon at the end.

BTW, the parsing of this command has been broken since 3.2-rc1 due to
commit 129957069e6a.

-- 
Måns Rullgård


Re: [PATCH] auxdisplay: charlcd: fix x/y command parsing

2018-12-05 Thread Måns Rullgård
Miguel Ojeda  writes:

> Hi Mans,
>
> [CC'ing a few people involved previously on this]
>
> On Wed, Dec 5, 2018 at 2:53 PM Mans Rullgard  wrote:
>>
>> Commit b34050fadb86 ("auxdisplay: charlcd: Fix and clean up handling of
>> x/y commands") fixed some problems by rewriting the parsing code,
>> but also broke things further by removing the check for a complete
>> command before attempting to parse it.  As a result, parsing is
>> terminated at the first x or y character.
>
> Why is it necessary to check for ";" to be exactly at the end? Or, in
> other words, what requires it?
>
> If the syntax supported by parse_xy() is wrong for some reason, we
> should fix that (instead of adding a check before calling it).

Suppose the command "\e[Lx0y0;" is written to the device.  The
charlcd_write_char() function adds one character at a time to the escape
sequence buffer.  Once the 'L' has been seen, it calls
handle_lcd_special_code() after each additional character is added.  On
encountering the 'x' this function will attempt to parse the command
using parse_xy(), which fails since it is incomplete.  It is nonetheless
reported as processed, and the escape sequence is flushed.  The
remainder of the command (i.e. "0y0;") is then displayed as text.

Since parse_xy() can never return success (true) unless there is a
semicolon, it is pointless to call it until we have seen one.  With the
characters being added to the buffer one by one, it is enough to check
for semicolon at the end.

BTW, the parsing of this command has been broken since 3.2-rc1 due to
commit 129957069e6a.

-- 
Måns Rullgård


Re: [PATCH 1/8] perf: Allow to block process in syscall tracepoints

2018-12-05 Thread Jiri Olsa
On Wed, Dec 05, 2018 at 12:35:03PM -0500, Steven Rostedt wrote:
> On Wed,  5 Dec 2018 17:05:02 +0100
> Jiri Olsa  wrote:
> 
> > diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c
> > index 3b2490b81918..e55cf9169a03 100644
> > --- a/arch/x86/entry/common.c
> > +++ b/arch/x86/entry/common.c
> > @@ -60,6 +60,32 @@ static void do_audit_syscall_entry(struct pt_regs *regs, 
> > u32 arch)
> > }
> >  }
> >  
> > +static void trace_block_syscall(struct pt_regs *regs, bool enter)
> > +{
> > +   current->perf_blocked = true;
> > +
> > +   do {
> > +   schedule_timeout(100 * HZ);
> > +   current->perf_blocked_cnt = 0;
> > +
> > +   if (enter) {
> > +   /* perf syscalls:* enter */
> > +   perf_trace_syscall_enter(regs);
> > +
> > +   /* perf raw_syscalls:* enter */
> > +   perf_trace_sys_enter(_sys_enter, regs, 
> > regs->orig_ax);
> > +   } else {
> > +   /* perf syscalls:* enter */
> > +   perf_trace_syscall_exit(regs);
> > +
> > +   /* perf raw_syscalls:* enter */
> > +   perf_trace_sys_exit(_sys_exit, regs, regs->ax);
> > +   }
> > +   } while (current->perf_blocked_cnt);
> 
> I was thinking, if the process reading the perf buffer dies, how do we
> tell this task to continue on?

when the tracer process dies, its event will get uninstalled from the
trace_event_call::perf_events list, so the next iteration of above loop
won't get any blocked event and it will leave (assuming there's just
single event)

jirka


Re: [PATCH 1/8] perf: Allow to block process in syscall tracepoints

2018-12-05 Thread Jiri Olsa
On Wed, Dec 05, 2018 at 12:35:03PM -0500, Steven Rostedt wrote:
> On Wed,  5 Dec 2018 17:05:02 +0100
> Jiri Olsa  wrote:
> 
> > diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c
> > index 3b2490b81918..e55cf9169a03 100644
> > --- a/arch/x86/entry/common.c
> > +++ b/arch/x86/entry/common.c
> > @@ -60,6 +60,32 @@ static void do_audit_syscall_entry(struct pt_regs *regs, 
> > u32 arch)
> > }
> >  }
> >  
> > +static void trace_block_syscall(struct pt_regs *regs, bool enter)
> > +{
> > +   current->perf_blocked = true;
> > +
> > +   do {
> > +   schedule_timeout(100 * HZ);
> > +   current->perf_blocked_cnt = 0;
> > +
> > +   if (enter) {
> > +   /* perf syscalls:* enter */
> > +   perf_trace_syscall_enter(regs);
> > +
> > +   /* perf raw_syscalls:* enter */
> > +   perf_trace_sys_enter(_sys_enter, regs, 
> > regs->orig_ax);
> > +   } else {
> > +   /* perf syscalls:* enter */
> > +   perf_trace_syscall_exit(regs);
> > +
> > +   /* perf raw_syscalls:* enter */
> > +   perf_trace_sys_exit(_sys_exit, regs, regs->ax);
> > +   }
> > +   } while (current->perf_blocked_cnt);
> 
> I was thinking, if the process reading the perf buffer dies, how do we
> tell this task to continue on?

when the tracer process dies, its event will get uninstalled from the
trace_event_call::perf_events list, so the next iteration of above loop
won't get any blocked event and it will leave (assuming there's just
single event)

jirka


Re: [RFC PATCH 00/14] Heterogeneous Memory System (HMS) and hbind()

2018-12-05 Thread Jerome Glisse
On Wed, Dec 05, 2018 at 09:27:09AM -0800, Dave Hansen wrote:
> On 12/4/18 6:13 PM, Jerome Glisse wrote:
> > On Tue, Dec 04, 2018 at 05:06:49PM -0800, Dave Hansen wrote:
> >> OK, but there are 1024*1024 matrix cells on a systems with 1024
> >> proximity domains (ACPI term for NUMA node).  So it sounds like you are
> >> proposing a million-directory approach.
> > 
> > No, pseudo code:
> > struct list links;
> > 
> > for (unsigned r = 0; r < nrows; r++) {
> > for (unsigned c = 0; c < ncolumns; c++) {
> > if (!link_find(links, hmat[r][c].bandwidth,
> >hmat[r][c].latency)) {
> > link = link_new(hmat[r][c].bandwidth,
> > hmat[r][c].latency);
> > // add initiator and target correspond to that row
> > // and columns to this new link
> > list_add(, links);
> > }
> > }
> > }
> > 
> > So all cells that have same property are under the same link. 
> 
> OK, so the "link" here is like a cable.  It's like saying, "we have a
> network and everything is connected with an ethernet cable that can do
> 1gbit/sec".
> 
> But, what actually connects an initiator to a target?  I assume we still
> need to know which link is used for each target/initiator pair.  Where
> is that enumerated?

ls /sys/bus/hms/devices/v0-0-link/
node0   power   subsystem   uevent
uid bandwidth   latency v0-1-target
v0-15-initiator v0-21-targetv0-4-initiator  v0-7-initiator
v0-10-initiator v0-13-initiator v0-16-initiator v0-2-initiator
v0-11-initiator v0-14-initiator v0-17-initiator v0-3-initiator
v0-5-initiator  v0-8-initiator  v0-6-initiator  v0-9-initiator
v0-12-initiator v0-10-initiator

So above is 16 CPUs (initiators*) and 2 targets all connected
through a common link. This means that all the initiators
connected to this link can access all the target connected to
this link. The bandwidth and latency is best case scenario
for instance when only one initiator is accessing the target.

Initiator can only access target they share a link with or
an extended path through a bridge. So if you have an initiator
connected to link0 and a target connected to link1 and there
is a bridge link0 to link1 then the initiator can access the
target memory in link1 but the bandwidth and latency will be
min(link0.bandwidth, link1.bandwidth, bridge.bandwidth)
min(link0.latency, link1.latency, bridge.latency)

You can really match one to one a link with bus in your
system. For instance with PCIE if you only have 16lanes
PCIE devices you only devince one link directory for all
your PCIE devices (ignore the PCIE peer to peer scenario
here). You add a bride between your PCIE link to your
NUMA node link (the node to which this PCIE root complex
belongs), this means that PCIE device can access the local
node memory with given bandwidth and latency (best case).


> 
> I think this just means we need a million symlinks to a "link" instead
> of a million link directories.  Still not great.
> 
> > Note that userspace can parse all this once during its initialization
> > and create pools of target to use.
> 
> It sounds like you're agreeing that there is too much data in this
> interface for applications to _regularly_ parse it.  We need some
> central thing that parses it all and caches the results.

No so there is 2 kinds of applications:
1) average one: i am using device {1, 3, 9} give me best memory for
   those devices
2) advance one: what is the topology of this system ? Parse the
   topology and partition its workload accordingly

For case 1 you can pre-parse stuff but this can be done by helper library
but for case 2 there is no amount of pre-parsing you can do in kernel, only
the application knows its own architecture and thus only the application
knows what matter in the topology. Is the application looking for big
chunk of memory even if it is slow ? Is it also looking for fast memory
close to X and Y ? ...

Each application will care about different thing and there is no telling
what its gonna be.

So what i am saying is that this information is likely to be parse once
by the application during startup ie the sysfs is not something that
is continuously read and parse by the application (unless application
also care about hotplug and then we are talking about the 1% of the 1%).

Cheers,
Jérôme


Re: [RFC PATCH 00/14] Heterogeneous Memory System (HMS) and hbind()

2018-12-05 Thread Jerome Glisse
On Wed, Dec 05, 2018 at 09:27:09AM -0800, Dave Hansen wrote:
> On 12/4/18 6:13 PM, Jerome Glisse wrote:
> > On Tue, Dec 04, 2018 at 05:06:49PM -0800, Dave Hansen wrote:
> >> OK, but there are 1024*1024 matrix cells on a systems with 1024
> >> proximity domains (ACPI term for NUMA node).  So it sounds like you are
> >> proposing a million-directory approach.
> > 
> > No, pseudo code:
> > struct list links;
> > 
> > for (unsigned r = 0; r < nrows; r++) {
> > for (unsigned c = 0; c < ncolumns; c++) {
> > if (!link_find(links, hmat[r][c].bandwidth,
> >hmat[r][c].latency)) {
> > link = link_new(hmat[r][c].bandwidth,
> > hmat[r][c].latency);
> > // add initiator and target correspond to that row
> > // and columns to this new link
> > list_add(, links);
> > }
> > }
> > }
> > 
> > So all cells that have same property are under the same link. 
> 
> OK, so the "link" here is like a cable.  It's like saying, "we have a
> network and everything is connected with an ethernet cable that can do
> 1gbit/sec".
> 
> But, what actually connects an initiator to a target?  I assume we still
> need to know which link is used for each target/initiator pair.  Where
> is that enumerated?

ls /sys/bus/hms/devices/v0-0-link/
node0   power   subsystem   uevent
uid bandwidth   latency v0-1-target
v0-15-initiator v0-21-targetv0-4-initiator  v0-7-initiator
v0-10-initiator v0-13-initiator v0-16-initiator v0-2-initiator
v0-11-initiator v0-14-initiator v0-17-initiator v0-3-initiator
v0-5-initiator  v0-8-initiator  v0-6-initiator  v0-9-initiator
v0-12-initiator v0-10-initiator

So above is 16 CPUs (initiators*) and 2 targets all connected
through a common link. This means that all the initiators
connected to this link can access all the target connected to
this link. The bandwidth and latency is best case scenario
for instance when only one initiator is accessing the target.

Initiator can only access target they share a link with or
an extended path through a bridge. So if you have an initiator
connected to link0 and a target connected to link1 and there
is a bridge link0 to link1 then the initiator can access the
target memory in link1 but the bandwidth and latency will be
min(link0.bandwidth, link1.bandwidth, bridge.bandwidth)
min(link0.latency, link1.latency, bridge.latency)

You can really match one to one a link with bus in your
system. For instance with PCIE if you only have 16lanes
PCIE devices you only devince one link directory for all
your PCIE devices (ignore the PCIE peer to peer scenario
here). You add a bride between your PCIE link to your
NUMA node link (the node to which this PCIE root complex
belongs), this means that PCIE device can access the local
node memory with given bandwidth and latency (best case).


> 
> I think this just means we need a million symlinks to a "link" instead
> of a million link directories.  Still not great.
> 
> > Note that userspace can parse all this once during its initialization
> > and create pools of target to use.
> 
> It sounds like you're agreeing that there is too much data in this
> interface for applications to _regularly_ parse it.  We need some
> central thing that parses it all and caches the results.

No so there is 2 kinds of applications:
1) average one: i am using device {1, 3, 9} give me best memory for
   those devices
2) advance one: what is the topology of this system ? Parse the
   topology and partition its workload accordingly

For case 1 you can pre-parse stuff but this can be done by helper library
but for case 2 there is no amount of pre-parsing you can do in kernel, only
the application knows its own architecture and thus only the application
knows what matter in the topology. Is the application looking for big
chunk of memory even if it is slow ? Is it also looking for fast memory
close to X and Y ? ...

Each application will care about different thing and there is no telling
what its gonna be.

So what i am saying is that this information is likely to be parse once
by the application during startup ie the sysfs is not something that
is continuously read and parse by the application (unless application
also care about hotplug and then we are talking about the 1% of the 1%).

Cheers,
Jérôme


Re: [RFT PATCH v1 0/4] Unify CPU topology across ARM64 & RISC-V

2018-12-05 Thread Jeffrey Hugo

On 11/29/2018 4:28 PM, Atish Patra wrote:

The cpu-map DT entry in ARM64 can describe the CPU topology in
much better way compared to other existing approaches. RISC-V can
easily adopt this binding to represent it's own CPU topology.
Thus, both cpu-map DT binding and topology parsing code can be
moved to a common location so that RISC-V or any other
architecture can leverage that.

The relevant discussion regarding unifying cpu topology can be
found in [1].

arch_topology seems to be a perfect place to move the common
code. I have not introduced any functional changes in the moved
code. The only downside in this approach is that the capacity
code will be executed for RISC-V as well. But, it will exit
immediately after not able to find the appropriate DT node. If
the overhead is considered too much, we can always compile out
capacity related functions under a different config for the
architectures that do not support them.

The patches have been tested for RISC-V and compile tested for
ARM64 & x86.

The socket change[2] is also now part of this series.

[1] https://lkml.org/lkml/2018/11/6/19
[2] https://lkml.org/lkml/2018/11/7/918

QEMU changes for RISC-V topology are available at

https://github.com/atishp04/riscv-qemu/tree/cpu_topo

Apologies for the previous patch series with incorrect title and
was sent only to kernel mailing list due to a bug in my config.
Please ignore that.

Atish Patra (3):
dt-binding: cpu-topology: Move cpu-map to a common binding.
cpu-topology: Move cpu topology code to common code.
RISC-V: Parse cpu topology during boot.

Sudeep Holla (1):
Documentation: DT: arm: add support for sockets defining package
boundaries

.../{arm/topology.txt => cpu/cpu-topology.txt} | 133 +++--
arch/arm64/include/asm/topology.h  |  22 --
arch/arm64/kernel/topology.c   | 303 +
arch/riscv/Kconfig |   1 +
arch/riscv/kernel/smpboot.c|   3 +
drivers/base/arch_topology.c   | 294 
include/linux/arch_topology.h  |  26 ++
include/linux/topology.h   |   1 +
8 files changed, 435 insertions(+), 348 deletions(-)
rename Documentation/devicetree/bindings/{arm/topology.txt => 
cpu/cpu-topology.txt} (66%)

--
2.7.4



Seems to test fine on QDF2400.

Tested-by: Jeffrey Hugo 

I did see that git am complained about patch #2 -

patch:103: space before tab in indent.
};
patch:114: space before tab in indent.
};
warning: 2 lines add whitespace errors.


--
Jeffrey Hugo
Qualcomm Datacenter Technologies as an affiliate of Qualcomm 
Technologies, Inc.

Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.


Re: [RFT PATCH v1 0/4] Unify CPU topology across ARM64 & RISC-V

2018-12-05 Thread Jeffrey Hugo

On 11/29/2018 4:28 PM, Atish Patra wrote:

The cpu-map DT entry in ARM64 can describe the CPU topology in
much better way compared to other existing approaches. RISC-V can
easily adopt this binding to represent it's own CPU topology.
Thus, both cpu-map DT binding and topology parsing code can be
moved to a common location so that RISC-V or any other
architecture can leverage that.

The relevant discussion regarding unifying cpu topology can be
found in [1].

arch_topology seems to be a perfect place to move the common
code. I have not introduced any functional changes in the moved
code. The only downside in this approach is that the capacity
code will be executed for RISC-V as well. But, it will exit
immediately after not able to find the appropriate DT node. If
the overhead is considered too much, we can always compile out
capacity related functions under a different config for the
architectures that do not support them.

The patches have been tested for RISC-V and compile tested for
ARM64 & x86.

The socket change[2] is also now part of this series.

[1] https://lkml.org/lkml/2018/11/6/19
[2] https://lkml.org/lkml/2018/11/7/918

QEMU changes for RISC-V topology are available at

https://github.com/atishp04/riscv-qemu/tree/cpu_topo

Apologies for the previous patch series with incorrect title and
was sent only to kernel mailing list due to a bug in my config.
Please ignore that.

Atish Patra (3):
dt-binding: cpu-topology: Move cpu-map to a common binding.
cpu-topology: Move cpu topology code to common code.
RISC-V: Parse cpu topology during boot.

Sudeep Holla (1):
Documentation: DT: arm: add support for sockets defining package
boundaries

.../{arm/topology.txt => cpu/cpu-topology.txt} | 133 +++--
arch/arm64/include/asm/topology.h  |  22 --
arch/arm64/kernel/topology.c   | 303 +
arch/riscv/Kconfig |   1 +
arch/riscv/kernel/smpboot.c|   3 +
drivers/base/arch_topology.c   | 294 
include/linux/arch_topology.h  |  26 ++
include/linux/topology.h   |   1 +
8 files changed, 435 insertions(+), 348 deletions(-)
rename Documentation/devicetree/bindings/{arm/topology.txt => 
cpu/cpu-topology.txt} (66%)

--
2.7.4



Seems to test fine on QDF2400.

Tested-by: Jeffrey Hugo 

I did see that git am complained about patch #2 -

patch:103: space before tab in indent.
};
patch:114: space before tab in indent.
};
warning: 2 lines add whitespace errors.


--
Jeffrey Hugo
Qualcomm Datacenter Technologies as an affiliate of Qualcomm 
Technologies, Inc.

Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.


Re: [PATCH v2 2/2] soc: fsl: dpio: use a cpumask to identify which cpus are unused

2018-12-05 Thread Li Yang
On Tue, Dec 4, 2018 at 5:23 AM Ioana Ciornei  wrote:
>
> The current implementation of the dpio driver uses a static next_cpu
> variable to keep track of the index of the next cpu available. This
> approach does not handle well unbinding and binding dpio devices in a
> random order. For example, unbinding a dpio and then binding it again
> with the driver, will generate the below error:
>
> $ echo dpio.5 > /sys/bus/fsl-mc/drivers/fsl_mc_dpio/unbind
> $ echo dpio.5 > /sys/bus/fsl-mc/drivers/fsl_mc_dpio/bind
> [  103.946380] fsl_mc_dpio dpio.5: probe failed. Number of DPIOs exceeds
> NR_CPUS.
> [  103.955157] fsl_mc_dpio dpio.5: fsl_mc_driver_probe failed: -34
> -bash: echo: write error: No such device
>
> Fix this error by keeping a global cpumask of unused cpus that will be
> updated at every dpaa2_dpio_[probe,remove].

Applied. Thanks.

>
> Signed-off-by: Ioana Ciornei 
> ---
> Changes in v2:
>   - added kernel-doc comment to the dpaa2_io_get_cpu function
>
>  drivers/soc/fsl/dpio/dpio-driver.c  | 25 -
>  drivers/soc/fsl/dpio/dpio-service.c | 13 +
>  include/soc/fsl/dpaa2-io.h  |  2 ++
>  3 files changed, 31 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/soc/fsl/dpio/dpio-driver.c 
> b/drivers/soc/fsl/dpio/dpio-driver.c
> index e58fcc9..832175c 100644
> --- a/drivers/soc/fsl/dpio/dpio-driver.c
> +++ b/drivers/soc/fsl/dpio/dpio-driver.c
> @@ -30,6 +30,8 @@ struct dpio_priv {
> struct dpaa2_io *io;
>  };
>
> +static cpumask_var_t cpus_unused_mask;
> +
>  static irqreturn_t dpio_irq_handler(int irq_num, void *arg)
>  {
> struct device *dev = (struct device *)arg;
> @@ -86,7 +88,7 @@ static int dpaa2_dpio_probe(struct fsl_mc_device *dpio_dev)
> struct dpio_priv *priv;
> int err = -ENOMEM;
> struct device *dev = _dev->dev;
> -   static int next_cpu = -1;
> +   int possible_next_cpu;
>
> priv = devm_kzalloc(dev, sizeof(*priv), GFP_KERNEL);
> if (!priv)
> @@ -128,17 +130,14 @@ static int dpaa2_dpio_probe(struct fsl_mc_device 
> *dpio_dev)
> desc.dpio_id = dpio_dev->obj_desc.id;
>
> /* get the cpu to use for the affinity hint */
> -   if (next_cpu == -1)
> -   next_cpu = cpumask_first(cpu_online_mask);
> -   else
> -   next_cpu = cpumask_next(next_cpu, cpu_online_mask);
> -
> -   if (!cpu_possible(next_cpu)) {
> +   possible_next_cpu = cpumask_first(cpus_unused_mask);
> +   if (possible_next_cpu >= nr_cpu_ids) {
> dev_err(dev, "probe failed. Number of DPIOs exceeds 
> NR_CPUS.\n");
> err = -ERANGE;
> goto err_allocate_irqs;
> }
> -   desc.cpu = next_cpu;
> +   desc.cpu = possible_next_cpu;
> +   cpumask_clear_cpu(possible_next_cpu, cpus_unused_mask);
>
> /*
>  * Set the CENA regs to be the cache inhibited area of the portal to
> @@ -211,7 +210,7 @@ static int dpaa2_dpio_remove(struct fsl_mc_device 
> *dpio_dev)
>  {
> struct device *dev;
> struct dpio_priv *priv;
> -   int err;
> +   int err = 0, cpu;
>
> dev = _dev->dev;
> priv = dev_get_drvdata(dev);
> @@ -220,6 +219,9 @@ static int dpaa2_dpio_remove(struct fsl_mc_device 
> *dpio_dev)
>
> dpio_teardown_irqs(dpio_dev);
>
> +   cpu = dpaa2_io_get_cpu(priv->io);
> +   cpumask_set_cpu(cpu, cpus_unused_mask);
> +
> err = fsl_mc_portal_allocate(dpio_dev, 0, _dev->mc_io);
> if (err) {
> dev_err(dev, "MC portal allocation failed\n");
> @@ -267,11 +269,16 @@ static int dpaa2_dpio_remove(struct fsl_mc_device 
> *dpio_dev)
>
>  static int dpio_driver_init(void)
>  {
> +   if (!zalloc_cpumask_var(_unused_mask, GFP_KERNEL))
> +   return -ENOMEM;
> +   cpumask_copy(cpus_unused_mask, cpu_online_mask);
> +
> return fsl_mc_driver_register(_dpio_driver);
>  }
>
>  static void dpio_driver_exit(void)
>  {
> +   free_cpumask_var(cpus_unused_mask);
> fsl_mc_driver_unregister(_dpio_driver);
>  }
>  module_init(dpio_driver_init);
> diff --git a/drivers/soc/fsl/dpio/dpio-service.c 
> b/drivers/soc/fsl/dpio/dpio-service.c
> index 21c3e32..3b60258 100644
> --- a/drivers/soc/fsl/dpio/dpio-service.c
> +++ b/drivers/soc/fsl/dpio/dpio-service.c
> @@ -215,6 +215,19 @@ irqreturn_t dpaa2_io_irq(struct dpaa2_io *obj)
>  }
>
>  /**
> + * dpaa2_io_get_cpu() - get the cpu associated with a given DPIO object
> + *
> + * @d: the given DPIO object.
> + *
> + * Return the cpu associated with the DPIO object
> + */
> +int dpaa2_io_get_cpu(struct dpaa2_io *d)
> +{
> +   return d->dpio_desc.cpu;
> +}
> +EXPORT_SYMBOL(dpaa2_io_get_cpu);
> +
> +/**
>   * dpaa2_io_service_register() - Prepare for servicing of FQDAN or CDAN
>   *   notifications on the given DPIO service.
>   * @d:   the given DPIO service.
> diff --git a/include/soc/fsl/dpaa2-io.h b/include/soc/fsl/dpaa2-io.h
> index 

Re: [PATCH v2 2/2] soc: fsl: dpio: use a cpumask to identify which cpus are unused

2018-12-05 Thread Li Yang
On Tue, Dec 4, 2018 at 5:23 AM Ioana Ciornei  wrote:
>
> The current implementation of the dpio driver uses a static next_cpu
> variable to keep track of the index of the next cpu available. This
> approach does not handle well unbinding and binding dpio devices in a
> random order. For example, unbinding a dpio and then binding it again
> with the driver, will generate the below error:
>
> $ echo dpio.5 > /sys/bus/fsl-mc/drivers/fsl_mc_dpio/unbind
> $ echo dpio.5 > /sys/bus/fsl-mc/drivers/fsl_mc_dpio/bind
> [  103.946380] fsl_mc_dpio dpio.5: probe failed. Number of DPIOs exceeds
> NR_CPUS.
> [  103.955157] fsl_mc_dpio dpio.5: fsl_mc_driver_probe failed: -34
> -bash: echo: write error: No such device
>
> Fix this error by keeping a global cpumask of unused cpus that will be
> updated at every dpaa2_dpio_[probe,remove].

Applied. Thanks.

>
> Signed-off-by: Ioana Ciornei 
> ---
> Changes in v2:
>   - added kernel-doc comment to the dpaa2_io_get_cpu function
>
>  drivers/soc/fsl/dpio/dpio-driver.c  | 25 -
>  drivers/soc/fsl/dpio/dpio-service.c | 13 +
>  include/soc/fsl/dpaa2-io.h  |  2 ++
>  3 files changed, 31 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/soc/fsl/dpio/dpio-driver.c 
> b/drivers/soc/fsl/dpio/dpio-driver.c
> index e58fcc9..832175c 100644
> --- a/drivers/soc/fsl/dpio/dpio-driver.c
> +++ b/drivers/soc/fsl/dpio/dpio-driver.c
> @@ -30,6 +30,8 @@ struct dpio_priv {
> struct dpaa2_io *io;
>  };
>
> +static cpumask_var_t cpus_unused_mask;
> +
>  static irqreturn_t dpio_irq_handler(int irq_num, void *arg)
>  {
> struct device *dev = (struct device *)arg;
> @@ -86,7 +88,7 @@ static int dpaa2_dpio_probe(struct fsl_mc_device *dpio_dev)
> struct dpio_priv *priv;
> int err = -ENOMEM;
> struct device *dev = _dev->dev;
> -   static int next_cpu = -1;
> +   int possible_next_cpu;
>
> priv = devm_kzalloc(dev, sizeof(*priv), GFP_KERNEL);
> if (!priv)
> @@ -128,17 +130,14 @@ static int dpaa2_dpio_probe(struct fsl_mc_device 
> *dpio_dev)
> desc.dpio_id = dpio_dev->obj_desc.id;
>
> /* get the cpu to use for the affinity hint */
> -   if (next_cpu == -1)
> -   next_cpu = cpumask_first(cpu_online_mask);
> -   else
> -   next_cpu = cpumask_next(next_cpu, cpu_online_mask);
> -
> -   if (!cpu_possible(next_cpu)) {
> +   possible_next_cpu = cpumask_first(cpus_unused_mask);
> +   if (possible_next_cpu >= nr_cpu_ids) {
> dev_err(dev, "probe failed. Number of DPIOs exceeds 
> NR_CPUS.\n");
> err = -ERANGE;
> goto err_allocate_irqs;
> }
> -   desc.cpu = next_cpu;
> +   desc.cpu = possible_next_cpu;
> +   cpumask_clear_cpu(possible_next_cpu, cpus_unused_mask);
>
> /*
>  * Set the CENA regs to be the cache inhibited area of the portal to
> @@ -211,7 +210,7 @@ static int dpaa2_dpio_remove(struct fsl_mc_device 
> *dpio_dev)
>  {
> struct device *dev;
> struct dpio_priv *priv;
> -   int err;
> +   int err = 0, cpu;
>
> dev = _dev->dev;
> priv = dev_get_drvdata(dev);
> @@ -220,6 +219,9 @@ static int dpaa2_dpio_remove(struct fsl_mc_device 
> *dpio_dev)
>
> dpio_teardown_irqs(dpio_dev);
>
> +   cpu = dpaa2_io_get_cpu(priv->io);
> +   cpumask_set_cpu(cpu, cpus_unused_mask);
> +
> err = fsl_mc_portal_allocate(dpio_dev, 0, _dev->mc_io);
> if (err) {
> dev_err(dev, "MC portal allocation failed\n");
> @@ -267,11 +269,16 @@ static int dpaa2_dpio_remove(struct fsl_mc_device 
> *dpio_dev)
>
>  static int dpio_driver_init(void)
>  {
> +   if (!zalloc_cpumask_var(_unused_mask, GFP_KERNEL))
> +   return -ENOMEM;
> +   cpumask_copy(cpus_unused_mask, cpu_online_mask);
> +
> return fsl_mc_driver_register(_dpio_driver);
>  }
>
>  static void dpio_driver_exit(void)
>  {
> +   free_cpumask_var(cpus_unused_mask);
> fsl_mc_driver_unregister(_dpio_driver);
>  }
>  module_init(dpio_driver_init);
> diff --git a/drivers/soc/fsl/dpio/dpio-service.c 
> b/drivers/soc/fsl/dpio/dpio-service.c
> index 21c3e32..3b60258 100644
> --- a/drivers/soc/fsl/dpio/dpio-service.c
> +++ b/drivers/soc/fsl/dpio/dpio-service.c
> @@ -215,6 +215,19 @@ irqreturn_t dpaa2_io_irq(struct dpaa2_io *obj)
>  }
>
>  /**
> + * dpaa2_io_get_cpu() - get the cpu associated with a given DPIO object
> + *
> + * @d: the given DPIO object.
> + *
> + * Return the cpu associated with the DPIO object
> + */
> +int dpaa2_io_get_cpu(struct dpaa2_io *d)
> +{
> +   return d->dpio_desc.cpu;
> +}
> +EXPORT_SYMBOL(dpaa2_io_get_cpu);
> +
> +/**
>   * dpaa2_io_service_register() - Prepare for servicing of FQDAN or CDAN
>   *   notifications on the given DPIO service.
>   * @d:   the given DPIO service.
> diff --git a/include/soc/fsl/dpaa2-io.h b/include/soc/fsl/dpaa2-io.h
> index 

Re: [PATCH v2 1/2] soc: fsl: dpio: cleanup the cpu array on dpaa2_io_down

2018-12-05 Thread Li Yang
On Tue, Dec 4, 2018 at 5:21 AM Ioana Ciornei  wrote:
>
> The dpio_by_cpu array should not contain a reference to a freed dpaa2_io
> object. This patch adds the necessary cleanup in dpaa2_io_down.

Applied.  Thanks.

>
> Signed-off-by: Ioana Ciornei 
> ---
> Changes in v2:
>   - none
>
>  drivers/soc/fsl/dpio/dpio-service.c | 5 +
>  1 file changed, 5 insertions(+)
>
> diff --git a/drivers/soc/fsl/dpio/dpio-service.c 
> b/drivers/soc/fsl/dpio/dpio-service.c
> index 9b17f72..21c3e32 100644
> --- a/drivers/soc/fsl/dpio/dpio-service.c
> +++ b/drivers/soc/fsl/dpio/dpio-service.c
> @@ -160,6 +160,11 @@ struct dpaa2_io *dpaa2_io_create(const struct 
> dpaa2_io_desc *desc)
>   */
>  void dpaa2_io_down(struct dpaa2_io *d)
>  {
> +   spin_lock(_list_lock);
> +   dpio_by_cpu[d->dpio_desc.cpu] = NULL;
> +   list_del(>node);
> +   spin_unlock(_list_lock);
> +
> kfree(d);
>  }
>
> --
> 1.9.1
>


Re: [PATCH v2 1/2] soc: fsl: dpio: cleanup the cpu array on dpaa2_io_down

2018-12-05 Thread Li Yang
On Tue, Dec 4, 2018 at 5:21 AM Ioana Ciornei  wrote:
>
> The dpio_by_cpu array should not contain a reference to a freed dpaa2_io
> object. This patch adds the necessary cleanup in dpaa2_io_down.

Applied.  Thanks.

>
> Signed-off-by: Ioana Ciornei 
> ---
> Changes in v2:
>   - none
>
>  drivers/soc/fsl/dpio/dpio-service.c | 5 +
>  1 file changed, 5 insertions(+)
>
> diff --git a/drivers/soc/fsl/dpio/dpio-service.c 
> b/drivers/soc/fsl/dpio/dpio-service.c
> index 9b17f72..21c3e32 100644
> --- a/drivers/soc/fsl/dpio/dpio-service.c
> +++ b/drivers/soc/fsl/dpio/dpio-service.c
> @@ -160,6 +160,11 @@ struct dpaa2_io *dpaa2_io_create(const struct 
> dpaa2_io_desc *desc)
>   */
>  void dpaa2_io_down(struct dpaa2_io *d)
>  {
> +   spin_lock(_list_lock);
> +   dpio_by_cpu[d->dpio_desc.cpu] = NULL;
> +   list_del(>node);
> +   spin_unlock(_list_lock);
> +
> kfree(d);
>  }
>
> --
> 1.9.1
>


Re: [PATCH 2/2] ASoC: DA7219: Implement error check on reg read and write

2018-12-05 Thread Daniel Kurtz
On Wed, Dec 5, 2018 at 4:28 AM Mark Brown  wrote:
>
> On Wed, Dec 05, 2018 at 10:21:04AM +, Adam Thomson wrote:
>
> > If the previous I2C access failed, how can we be sure that the write back 
> > to HW
> > of 0xFF even succeeds? More importantly these error returns won't 
> > necessarily
> > stop subsequent calls to controls within the Codec I believe, so you could 
> > still
> > see unwanted writes to HW via I2C, if I2C is sporadically operational. 
> > Again I
> > don't see this update resolving that. The key thing is to resolve why even 
> > just
> > one I2C transaction fails.
>
> Right, it's just not clear what we can constructively do if the I2C bus
> falls to bits other than log things and the I2C controllers will
> generally do that themselves.  There's no guarantee what made it
> through to the device or what will in future make it through to the
> device.

I agree, there is no guarantee here once things have gone wrong, and
the concerns above are reasonable.  However, in the real world, I2C
transactions do sometimes fail for various reasons.  The I2C (and
other bus) APIs have ways of reporting up their errors so callers can
take appropriate action.  This codec driver can run on all kinds of
hardware that can experience transient I2C errors, thus it sounds like
a reasonable idea to have the driver do some error checking on the
APIs it calls and take whatever action it can.  Just ignoring the
errors and proceeding like nothing is wrong is one option, but we can
probably do a little better by at least checking for errors, abort the
current operation, and pass up errors to higher layers when an i2c
transaction failure is detected.  If nothing else, this would enable
higher policy layers to take appropriate corrective action - for
example, if there is an i2c error when configuring a codec, it seems
advisable to report this up so that a machine driver would know to
abort and not turn on downstream amplifiers [I am assuming here that
something like this would happen, I don't know if the sound stack
really works this way].

Once the default "check, abort and report" error checking is in place,
we could perhaps think about actions that the driver could take to
recover from various failures, such as resetting the device or
unwinding previous transactions before aborting, or retrying  but
those are all policy, and this patch is more mechanism that enables
policy.

As for this patch itself, I would recommend using
snd_soc_component_read rather than snd_soc_component_read32() which is
fundamentally broken and afaict should be removed, since there is no
way to distinguish between its error return "(u32)-1" and the valid
register value 0x.  (Edit: I notice that you did this in v2,
so please ignore, but still replying here since this seems to be where
the active discussion is).

-Dan


Re: [PATCH 2/9] ACPI/IORT: Use helper functions to access dev->iommu_fwspec

2018-12-05 Thread Robin Murphy

On 04/12/2018 16:29, Joerg Roedel wrote:

From: Joerg Roedel 

Use the new helpers dev_iommu_fwspec_get()/set() to access
the dev->iommu_fwspec pointer. This makes it easier to move
that pointer later into another struct.

Cc: Lorenzo Pieralisi 
Signed-off-by: Joerg Roedel 
---
  drivers/acpi/arm64/iort.c | 12 +++-
  1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c
index 70f4e80b9246..754a67ba49e5 100644
--- a/drivers/acpi/arm64/iort.c
+++ b/drivers/acpi/arm64/iort.c
@@ -779,7 +779,7 @@ static inline bool iort_iommu_driver_enabled(u8 type)
  static struct acpi_iort_node *iort_get_msi_resv_iommu(struct device *dev)
  {
struct acpi_iort_node *iommu;
-   struct iommu_fwspec *fwspec = dev->iommu_fwspec;
+   struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
  
  	iommu = iort_get_iort_node(fwspec->iommu_fwnode);
  
@@ -824,6 +824,7 @@ static inline int iort_add_device_replay(const struct iommu_ops *ops,

   */
  int iort_iommu_msi_get_resv_regions(struct device *dev, struct list_head 
*head)
  {
+   struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
struct acpi_iort_its_group *its;
struct acpi_iort_node *iommu_node, *its_node = NULL;
int i, resv = 0;
@@ -841,9 +842,9 @@ int iort_iommu_msi_get_resv_regions(struct device *dev, 
struct list_head *head)
 * a given PCI or named component may map IDs to.
 */
  
-	for (i = 0; i < dev->iommu_fwspec->num_ids; i++) {

+   for (i = 0; i < fwspec->num_ids; i++) {
its_node = iort_node_map_id(iommu_node,
-   dev->iommu_fwspec->ids[i],
+   fwspec->ids[i],
NULL, IORT_MSI_TYPE);
if (its_node)
break;
@@ -1036,6 +1037,7 @@ void iort_dma_setup(struct device *dev, u64 *dma_addr, 
u64 *dma_size)
   */
  const struct iommu_ops *iort_iommu_configure(struct device *dev)
  {
+   struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
struct acpi_iort_node *node, *parent;
const struct iommu_ops *ops;
u32 streamid = 0;
@@ -1045,7 +1047,7 @@ const struct iommu_ops *iort_iommu_configure(struct 
device *dev)
 * If we already translated the fwspec there
 * is nothing left to do, return the iommu_ops.
 */
-   ops = iort_fwspec_iommu_ops(dev->iommu_fwspec);
+   ops = iort_fwspec_iommu_ops(fwspec);
if (ops)
return ops;
  
@@ -1084,7 +1086,7 @@ const struct iommu_ops *iort_iommu_configure(struct device *dev)

 * add_device callback for dev, replay it to get things in order.
 */
if (!err) {
-   ops = iort_fwspec_iommu_ops(dev->iommu_fwspec);
+   ops = iort_fwspec_iommu_ops(fwspec);


This needs to reload the new fwspec initialised by iort_iommu_xlate(), 
in the same manner as the OF code. I think the best thing to do is 
encapsulate the dev_iommu_fwspec_get() call in iort_fwspec_iommu_ops(), 
and have that take dev as its argument directly.


Robin.


err = iort_add_device_replay(ops, dev);
}
  



Re: [PATCH 2/2] ASoC: DA7219: Implement error check on reg read and write

2018-12-05 Thread Daniel Kurtz
On Wed, Dec 5, 2018 at 4:28 AM Mark Brown  wrote:
>
> On Wed, Dec 05, 2018 at 10:21:04AM +, Adam Thomson wrote:
>
> > If the previous I2C access failed, how can we be sure that the write back 
> > to HW
> > of 0xFF even succeeds? More importantly these error returns won't 
> > necessarily
> > stop subsequent calls to controls within the Codec I believe, so you could 
> > still
> > see unwanted writes to HW via I2C, if I2C is sporadically operational. 
> > Again I
> > don't see this update resolving that. The key thing is to resolve why even 
> > just
> > one I2C transaction fails.
>
> Right, it's just not clear what we can constructively do if the I2C bus
> falls to bits other than log things and the I2C controllers will
> generally do that themselves.  There's no guarantee what made it
> through to the device or what will in future make it through to the
> device.

I agree, there is no guarantee here once things have gone wrong, and
the concerns above are reasonable.  However, in the real world, I2C
transactions do sometimes fail for various reasons.  The I2C (and
other bus) APIs have ways of reporting up their errors so callers can
take appropriate action.  This codec driver can run on all kinds of
hardware that can experience transient I2C errors, thus it sounds like
a reasonable idea to have the driver do some error checking on the
APIs it calls and take whatever action it can.  Just ignoring the
errors and proceeding like nothing is wrong is one option, but we can
probably do a little better by at least checking for errors, abort the
current operation, and pass up errors to higher layers when an i2c
transaction failure is detected.  If nothing else, this would enable
higher policy layers to take appropriate corrective action - for
example, if there is an i2c error when configuring a codec, it seems
advisable to report this up so that a machine driver would know to
abort and not turn on downstream amplifiers [I am assuming here that
something like this would happen, I don't know if the sound stack
really works this way].

Once the default "check, abort and report" error checking is in place,
we could perhaps think about actions that the driver could take to
recover from various failures, such as resetting the device or
unwinding previous transactions before aborting, or retrying  but
those are all policy, and this patch is more mechanism that enables
policy.

As for this patch itself, I would recommend using
snd_soc_component_read rather than snd_soc_component_read32() which is
fundamentally broken and afaict should be removed, since there is no
way to distinguish between its error return "(u32)-1" and the valid
register value 0x.  (Edit: I notice that you did this in v2,
so please ignore, but still replying here since this seems to be where
the active discussion is).

-Dan


Re: [PATCH 2/9] ACPI/IORT: Use helper functions to access dev->iommu_fwspec

2018-12-05 Thread Robin Murphy

On 04/12/2018 16:29, Joerg Roedel wrote:

From: Joerg Roedel 

Use the new helpers dev_iommu_fwspec_get()/set() to access
the dev->iommu_fwspec pointer. This makes it easier to move
that pointer later into another struct.

Cc: Lorenzo Pieralisi 
Signed-off-by: Joerg Roedel 
---
  drivers/acpi/arm64/iort.c | 12 +++-
  1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c
index 70f4e80b9246..754a67ba49e5 100644
--- a/drivers/acpi/arm64/iort.c
+++ b/drivers/acpi/arm64/iort.c
@@ -779,7 +779,7 @@ static inline bool iort_iommu_driver_enabled(u8 type)
  static struct acpi_iort_node *iort_get_msi_resv_iommu(struct device *dev)
  {
struct acpi_iort_node *iommu;
-   struct iommu_fwspec *fwspec = dev->iommu_fwspec;
+   struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
  
  	iommu = iort_get_iort_node(fwspec->iommu_fwnode);
  
@@ -824,6 +824,7 @@ static inline int iort_add_device_replay(const struct iommu_ops *ops,

   */
  int iort_iommu_msi_get_resv_regions(struct device *dev, struct list_head 
*head)
  {
+   struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
struct acpi_iort_its_group *its;
struct acpi_iort_node *iommu_node, *its_node = NULL;
int i, resv = 0;
@@ -841,9 +842,9 @@ int iort_iommu_msi_get_resv_regions(struct device *dev, 
struct list_head *head)
 * a given PCI or named component may map IDs to.
 */
  
-	for (i = 0; i < dev->iommu_fwspec->num_ids; i++) {

+   for (i = 0; i < fwspec->num_ids; i++) {
its_node = iort_node_map_id(iommu_node,
-   dev->iommu_fwspec->ids[i],
+   fwspec->ids[i],
NULL, IORT_MSI_TYPE);
if (its_node)
break;
@@ -1036,6 +1037,7 @@ void iort_dma_setup(struct device *dev, u64 *dma_addr, 
u64 *dma_size)
   */
  const struct iommu_ops *iort_iommu_configure(struct device *dev)
  {
+   struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
struct acpi_iort_node *node, *parent;
const struct iommu_ops *ops;
u32 streamid = 0;
@@ -1045,7 +1047,7 @@ const struct iommu_ops *iort_iommu_configure(struct 
device *dev)
 * If we already translated the fwspec there
 * is nothing left to do, return the iommu_ops.
 */
-   ops = iort_fwspec_iommu_ops(dev->iommu_fwspec);
+   ops = iort_fwspec_iommu_ops(fwspec);
if (ops)
return ops;
  
@@ -1084,7 +1086,7 @@ const struct iommu_ops *iort_iommu_configure(struct device *dev)

 * add_device callback for dev, replay it to get things in order.
 */
if (!err) {
-   ops = iort_fwspec_iommu_ops(dev->iommu_fwspec);
+   ops = iort_fwspec_iommu_ops(fwspec);


This needs to reload the new fwspec initialised by iort_iommu_xlate(), 
in the same manner as the OF code. I think the best thing to do is 
encapsulate the dev_iommu_fwspec_get() call in iort_fwspec_iommu_ops(), 
and have that take dev as its argument directly.


Robin.


err = iort_add_device_replay(ops, dev);
}
  



Re: [PATCH v1 3/5] perf cs-etm: Support for NO_SYNC packet

2018-12-05 Thread Mathieu Poirier
On Tue, 4 Dec 2018 at 20:31,  wrote:
>
> On Mon, Nov 19, 2018 at 11:27:59AM -0700, Mathieu Poirier wrote:
> > On Sun, Nov 11, 2018 at 12:59:41PM +0800, Leo Yan wrote:
> > > As described in OpenCSD (CoreSight decoder lib), in the decoding stream
> > > it includes one trace element with type OCSD_GEN_TRC_ELEM_NO_SYNC; the
> > > element indicates 'either at start of decode, or after overflow / bad
> > > packet', we should take it as a signal for the tracing off and this will
> > > cause tracing discontinuity.  From the trace dump with 'perf script',
> > > sometimes the element OCSD_GEN_TRC_ELEM_NO_SYNC collaborates with
> > > element OCSD_GEN_TRC_ELEM_TRACE_ON to show the tracing flow have been
> > > turned off and on, in this case the cs-etm code has handled TRACE_ON
> > > packet well so we observe the tracing discontinuity; but in another case
> > > it only inserts the element OCSD_GEN_TRC_ELEM_NO_SYNC into instructions
> > > packets, we miss to handle the case if has only standalone NO_SYNC
> > > element and users cannot receive the info for tracing discontinuity.
> > >
> > > This patch introduces new type CS_ETM_TRACE_OFF to generate packet for
> > > receiving element OCSD_GEN_TRC_ELEM_NO_SYNC from decoder; when generate
> > > sample, CS_ETM_TRACE_OFF packet has almost the same behaviour with
> >
> > Here you have used the word "almost", leading me to beleive there are cases
> > where the handling of TRACE_ON and NO_SYNC packets differ, but the
> > implementation enacts the same behavior for both.
> >
> > Mike or Robert, can you confirm that TRACE_ON and NO_SYNC packets should be
> > treated the same way?
>
> I also would like to get suggestions/comments from Mike and Rob.
>
> For NO_SYNC packet, Mike before has given some explination for it:
>
> "looking at the decoder flow, when the decoder is reset, then it is
> returned to an unsynchronised state and a NO_SYNC will be output.
> With perf, it can concatenate multiple trace buffers into a single
> section of the perf.data file. To ensure that the decode process can
> see the start of a new buffer, the drivers will insert a barrier
> packet between different buffers so that the decoder can spot the
> boundaries. When the decoder hits a barrier packet it will
> automatically reset so that it correctly decodes the next trace
> buffer. This could be what you are seeing here."
>
> So I think there still have some difference between TRACE_ON and
> NO_SYNC packets, TRACE_ON packet indicates the start of trace and it's
> also possible caused by tracing discontinuity; NO_SYNC packets usually
> caused by an unsynchronised state.  But both of them presents
> discontinuity in perf trace data.
>
> I prefer to use two different packet types to present them, this can
> let the code to be more readable and easier to be maintained later.

Absolutely.  Make sure to get to the bottom of the story with Mike
and/or Robert before sending your next patchset.

>
> If you agree with this, I also will rephrase the commit log to avoid
> confusion that these two packets are the same thing.

Yes please.

>
> > Leo, if handling of the TRACE_ON and NO_SYNC packets is the same then we 
> > should
> > treat them the same way in cs_etm_decoder__gen_trace_elem_printer(), i.e 
> > simply
> > call cs_etm_decoder__buffer_trace_on().  That way packet handling in 
> > cs-etm.c
> > doesn't change.  Otherwise see my comments below.
> >
> > > CS_ETM_TRACE_ON packet: both of them invokes cs_etm__flush() to generate
> > > samples for the previous instructions packet, and in cs_etm__sample() it
> > > also needs to generate samples if TRACE_OFF packet is followed by one
> > > sequential instructions packet.  This patch also converts the address to
> > > 0 for TRACE_OFF packet, this is same with TRACE_ON packet as well.
> > >
> > > Signed-off-by: Leo Yan 
> > > ---
> > >  tools/perf/util/cs-etm-decoder/cs-etm-decoder.c | 10 ++
> > >  tools/perf/util/cs-etm-decoder/cs-etm-decoder.h |  7 ---
> > >  tools/perf/util/cs-etm.c| 15 +++
> > >  3 files changed, 25 insertions(+), 7 deletions(-)
> > >
> > > diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c 
> > > b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
> > > index 5efb616..9d52727 100644
> > > --- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
> > > +++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
> > > @@ -369,6 +369,14 @@ cs_etm_decoder__buffer_range(struct cs_etm_decoder 
> > > *decoder,
> > >  }
> > >
> > >  static ocsd_datapath_resp_t
> > > +cs_etm_decoder__buffer_trace_off(struct cs_etm_decoder *decoder,
> > > +const uint8_t trace_chan_id)
> > > +{
> > > +   return cs_etm_decoder__buffer_packet(decoder, trace_chan_id,
> > > +CS_ETM_TRACE_OFF);
> > > +}
> > > +
> > > +static ocsd_datapath_resp_t
> > >  cs_etm_decoder__buffer_trace_on(struct cs_etm_decoder *decoder,
> > > const uint8_t 

Re: [PATCH v1 3/5] perf cs-etm: Support for NO_SYNC packet

2018-12-05 Thread Mathieu Poirier
On Tue, 4 Dec 2018 at 20:31,  wrote:
>
> On Mon, Nov 19, 2018 at 11:27:59AM -0700, Mathieu Poirier wrote:
> > On Sun, Nov 11, 2018 at 12:59:41PM +0800, Leo Yan wrote:
> > > As described in OpenCSD (CoreSight decoder lib), in the decoding stream
> > > it includes one trace element with type OCSD_GEN_TRC_ELEM_NO_SYNC; the
> > > element indicates 'either at start of decode, or after overflow / bad
> > > packet', we should take it as a signal for the tracing off and this will
> > > cause tracing discontinuity.  From the trace dump with 'perf script',
> > > sometimes the element OCSD_GEN_TRC_ELEM_NO_SYNC collaborates with
> > > element OCSD_GEN_TRC_ELEM_TRACE_ON to show the tracing flow have been
> > > turned off and on, in this case the cs-etm code has handled TRACE_ON
> > > packet well so we observe the tracing discontinuity; but in another case
> > > it only inserts the element OCSD_GEN_TRC_ELEM_NO_SYNC into instructions
> > > packets, we miss to handle the case if has only standalone NO_SYNC
> > > element and users cannot receive the info for tracing discontinuity.
> > >
> > > This patch introduces new type CS_ETM_TRACE_OFF to generate packet for
> > > receiving element OCSD_GEN_TRC_ELEM_NO_SYNC from decoder; when generate
> > > sample, CS_ETM_TRACE_OFF packet has almost the same behaviour with
> >
> > Here you have used the word "almost", leading me to beleive there are cases
> > where the handling of TRACE_ON and NO_SYNC packets differ, but the
> > implementation enacts the same behavior for both.
> >
> > Mike or Robert, can you confirm that TRACE_ON and NO_SYNC packets should be
> > treated the same way?
>
> I also would like to get suggestions/comments from Mike and Rob.
>
> For NO_SYNC packet, Mike before has given some explination for it:
>
> "looking at the decoder flow, when the decoder is reset, then it is
> returned to an unsynchronised state and a NO_SYNC will be output.
> With perf, it can concatenate multiple trace buffers into a single
> section of the perf.data file. To ensure that the decode process can
> see the start of a new buffer, the drivers will insert a barrier
> packet between different buffers so that the decoder can spot the
> boundaries. When the decoder hits a barrier packet it will
> automatically reset so that it correctly decodes the next trace
> buffer. This could be what you are seeing here."
>
> So I think there still have some difference between TRACE_ON and
> NO_SYNC packets, TRACE_ON packet indicates the start of trace and it's
> also possible caused by tracing discontinuity; NO_SYNC packets usually
> caused by an unsynchronised state.  But both of them presents
> discontinuity in perf trace data.
>
> I prefer to use two different packet types to present them, this can
> let the code to be more readable and easier to be maintained later.

Absolutely.  Make sure to get to the bottom of the story with Mike
and/or Robert before sending your next patchset.

>
> If you agree with this, I also will rephrase the commit log to avoid
> confusion that these two packets are the same thing.

Yes please.

>
> > Leo, if handling of the TRACE_ON and NO_SYNC packets is the same then we 
> > should
> > treat them the same way in cs_etm_decoder__gen_trace_elem_printer(), i.e 
> > simply
> > call cs_etm_decoder__buffer_trace_on().  That way packet handling in 
> > cs-etm.c
> > doesn't change.  Otherwise see my comments below.
> >
> > > CS_ETM_TRACE_ON packet: both of them invokes cs_etm__flush() to generate
> > > samples for the previous instructions packet, and in cs_etm__sample() it
> > > also needs to generate samples if TRACE_OFF packet is followed by one
> > > sequential instructions packet.  This patch also converts the address to
> > > 0 for TRACE_OFF packet, this is same with TRACE_ON packet as well.
> > >
> > > Signed-off-by: Leo Yan 
> > > ---
> > >  tools/perf/util/cs-etm-decoder/cs-etm-decoder.c | 10 ++
> > >  tools/perf/util/cs-etm-decoder/cs-etm-decoder.h |  7 ---
> > >  tools/perf/util/cs-etm.c| 15 +++
> > >  3 files changed, 25 insertions(+), 7 deletions(-)
> > >
> > > diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c 
> > > b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
> > > index 5efb616..9d52727 100644
> > > --- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
> > > +++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
> > > @@ -369,6 +369,14 @@ cs_etm_decoder__buffer_range(struct cs_etm_decoder 
> > > *decoder,
> > >  }
> > >
> > >  static ocsd_datapath_resp_t
> > > +cs_etm_decoder__buffer_trace_off(struct cs_etm_decoder *decoder,
> > > +const uint8_t trace_chan_id)
> > > +{
> > > +   return cs_etm_decoder__buffer_packet(decoder, trace_chan_id,
> > > +CS_ETM_TRACE_OFF);
> > > +}
> > > +
> > > +static ocsd_datapath_resp_t
> > >  cs_etm_decoder__buffer_trace_on(struct cs_etm_decoder *decoder,
> > > const uint8_t 

Re: [PATCH 1/2] ARM: Remove '-p' from LDFLAGS

2018-12-05 Thread Nick Desaulniers
On Tue, Dec 4, 2018 at 5:42 PM Nathan Chancellor
 wrote:
>
> This option is not supported by lld:
>
> ld.lld: error: unknown argument: -p
>
> This has been a no-op in binutils since 2004 (see commit dea514f51da1 in
> that tree). Given that the lowest officially supported of binutils for
> the kernel is 2.20, which was released in 2009, nobody needs this flag
> around so just remove it. Commit 1a381d4a0a9a ("arm64: remove no-op -p
> linker flag") did the same for arm64.

While investigating the context of 1a381d4a0a9a for review, I found
that bfd had removed -p's implementation a long time ago, but kept the
flag as a no-op for compatibility.  I asked that the patch be extended
to arm as well, but don't think a v2 was ever cut.

Thanks for sending this Nathan.
Reviewed-by: Nick Desaulniers 

>
> Signed-off-by: Nathan Chancellor 
> ---
>  arch/arm/Makefile | 2 +-
>  arch/arm/boot/compressed/Makefile | 2 --
>  2 files changed, 1 insertion(+), 3 deletions(-)
>
> diff --git a/arch/arm/Makefile b/arch/arm/Makefile
> index 05a91d8b89f3..e2a0baf36766 100644
> --- a/arch/arm/Makefile
> +++ b/arch/arm/Makefile
> @@ -10,7 +10,7 @@
>  #
>  # Copyright (C) 1995-2001 by Russell King
>
> -LDFLAGS_vmlinux:=-p --no-undefined -X --pic-veneer
> +LDFLAGS_vmlinux:= --no-undefined -X --pic-veneer
>  ifeq ($(CONFIG_CPU_ENDIAN_BE8),y)
>  LDFLAGS_vmlinux+= --be8
>  KBUILD_LDFLAGS_MODULE  += --be8
> diff --git a/arch/arm/boot/compressed/Makefile 
> b/arch/arm/boot/compressed/Makefile
> index 1f5a5ffe7fcf..dcd07bd24b85 100644
> --- a/arch/arm/boot/compressed/Makefile
> +++ b/arch/arm/boot/compressed/Makefile
> @@ -131,8 +131,6 @@ endif
>  ifeq ($(CONFIG_CPU_ENDIAN_BE8),y)
>  LDFLAGS_vmlinux += --be8
>  endif
> -# ?
> -LDFLAGS_vmlinux += -p
>  # Report unresolved symbol references
>  LDFLAGS_vmlinux += --no-undefined
>  # Delete all temporary local symbols
> --
> 2.20.0.rc1
>


-- 
Thanks,
~Nick Desaulniers


Re: [PATCH 1/2] ARM: Remove '-p' from LDFLAGS

2018-12-05 Thread Nick Desaulniers
On Tue, Dec 4, 2018 at 5:42 PM Nathan Chancellor
 wrote:
>
> This option is not supported by lld:
>
> ld.lld: error: unknown argument: -p
>
> This has been a no-op in binutils since 2004 (see commit dea514f51da1 in
> that tree). Given that the lowest officially supported of binutils for
> the kernel is 2.20, which was released in 2009, nobody needs this flag
> around so just remove it. Commit 1a381d4a0a9a ("arm64: remove no-op -p
> linker flag") did the same for arm64.

While investigating the context of 1a381d4a0a9a for review, I found
that bfd had removed -p's implementation a long time ago, but kept the
flag as a no-op for compatibility.  I asked that the patch be extended
to arm as well, but don't think a v2 was ever cut.

Thanks for sending this Nathan.
Reviewed-by: Nick Desaulniers 

>
> Signed-off-by: Nathan Chancellor 
> ---
>  arch/arm/Makefile | 2 +-
>  arch/arm/boot/compressed/Makefile | 2 --
>  2 files changed, 1 insertion(+), 3 deletions(-)
>
> diff --git a/arch/arm/Makefile b/arch/arm/Makefile
> index 05a91d8b89f3..e2a0baf36766 100644
> --- a/arch/arm/Makefile
> +++ b/arch/arm/Makefile
> @@ -10,7 +10,7 @@
>  #
>  # Copyright (C) 1995-2001 by Russell King
>
> -LDFLAGS_vmlinux:=-p --no-undefined -X --pic-veneer
> +LDFLAGS_vmlinux:= --no-undefined -X --pic-veneer
>  ifeq ($(CONFIG_CPU_ENDIAN_BE8),y)
>  LDFLAGS_vmlinux+= --be8
>  KBUILD_LDFLAGS_MODULE  += --be8
> diff --git a/arch/arm/boot/compressed/Makefile 
> b/arch/arm/boot/compressed/Makefile
> index 1f5a5ffe7fcf..dcd07bd24b85 100644
> --- a/arch/arm/boot/compressed/Makefile
> +++ b/arch/arm/boot/compressed/Makefile
> @@ -131,8 +131,6 @@ endif
>  ifeq ($(CONFIG_CPU_ENDIAN_BE8),y)
>  LDFLAGS_vmlinux += --be8
>  endif
> -# ?
> -LDFLAGS_vmlinux += -p
>  # Report unresolved symbol references
>  LDFLAGS_vmlinux += --no-undefined
>  # Delete all temporary local symbols
> --
> 2.20.0.rc1
>


-- 
Thanks,
~Nick Desaulniers


Re: rcu_preempt caused oom

2018-12-05 Thread Paul E. McKenney
On Wed, Dec 05, 2018 at 08:42:54AM +, He, Bo wrote:
> I double checked the .config, we don't enable CONFIG_NO_HZ_FULL .
> Our previous logs can dump all the task backtrace, and kthread (the 
> rcu_preempt, rcu_sched, and rcu_bh tasks) are all in "I" state not in "R" 
> state, my understandings are if it's the side-effect of causing RCU's 
> kthreads to be run at SCHED_FIFO priority 1, the kthreads should be in R 
> state.

Hmmm...  Well, the tasks could in theory be waiting on a blocking mutex.
But in practice the grace-period kthreads wait on events, so that makes
no sense.

Is it possible for you to dump out the grace-period kthread's stack,
for example, with sysreq-t?  (Steve might know a better way to do this.)

> I will do more experiments and keep you update once we have more findings:
> 1. set the kthread priority to SCHED_FIFO without CONFIG_RCU_BOOST and see if 
> the issue can reproduce.

That sounds like a most excellent experiment!

> 2. check more ftrace to double confirm why there is no 
> trace_rcu_quiescent_state_report and most of the trace_rcu_grace_period are 
> in "AccWaitCB".

As noted earlier, to see something interesting, you will need to start
the ftrace before the grace period starts.  This would probably mean
having ftrace running before starting the test.  Starting the ftrace
after the hang commences is unlikely to produce useful information.

Thanx, Paul

> -Original Message-
> From: Paul E. McKenney  
> Sent: Wednesday, December 5, 2018 3:50 AM
> To: He, Bo 
> Cc: Steven Rostedt ; linux-kernel@vger.kernel.org; 
> j...@joshtriplett.org; mathieu.desnoy...@efficios.com; 
> jiangshan...@gmail.com; Zhang, Jun ; Xiao, Jin 
> ; Zhang, Yanmin ; Bai, Jie A 
> 
> Subject: Re: rcu_preempt caused oom
> 
> On Tue, Dec 04, 2018 at 07:50:04AM +, He, Bo wrote:
> > Hi, Paul:
> > the enclosed is the log trigger the 120s hung_task_panic without other 
> > debug patches, the hung task is blocked at __wait_rcu_gp, it means the 
> > rcu_cpu_stall can't detect the scenario:
> > echo 1 > /proc/sys/kernel/panic_on_rcu_stall
> > echo 7 > /sys/module/rcupdate/parameters/rcu_cpu_stall_timeout
> 
> Not necessarily.  If there is an RCU CPU stall warning, blocking within
> __wait_rcu_gp() is expected behavior.  It is possible that the problem is 
> that although the grace period is completing as required, the callbacks are 
> not being invoked in a timely fashion.  And that could happen if you had 
> CONFIG_NO_HZ_FULL and a bunch of nohz_full CPUs, or, alternatively, callback 
> offloading enabled.  But I don't see these in your previous emails.  Another 
> possible cause is that the grace-period kthread is being delayed, so that the 
> grace period never starts.  This seems unlikely, but it is the only thing 
> thus far that matches the symptoms.
> 
> CONFIG_RCU_BOOST=y has the side-effect of causing RCU's kthreads to be run at 
> SCHED_FIFO priority 1, and that would help in the case where RCU's 
> grace-period kthread (the rcu_preempt, rcu_sched, and rcu_bh tasks, all of 
> which execute in the rcu_gp_kthread() function) was being starved of CPU time.
> 
> Does that sound likely?
> 
>   Thanx, Paul
> 
> > -Original Message-
> > From: Paul E. McKenney 
> > Sent: Monday, December 3, 2018 9:57 PM
> > To: He, Bo 
> > Cc: Steven Rostedt ; 
> > linux-kernel@vger.kernel.org; j...@joshtriplett.org; 
> > mathieu.desnoy...@efficios.com; jiangshan...@gmail.com; Zhang, Jun 
> > ; Xiao, Jin ; Zhang, Yanmin 
> > 
> > Subject: Re: rcu_preempt caused oom
> > 
> > On Mon, Dec 03, 2018 at 07:44:03AM +, He, Bo wrote:
> > > Thanks, we have run the test for the whole weekend and not reproduce the 
> > > issue,  so we confirm the CONFIG_RCU_BOOST can fix the issue.
> > 
> > Very good, that is encouraging.  Perhaps I should think about making 
> > CONFIG_RCU_BOOST=y the default for CONFIG_PREEMPT in mainline, at least for 
> > architectures for which rt_mutexes are implemented.
> > 
> > > We have enabled the rcupdate.rcu_cpu_stall_timeout=7 and also set panic 
> > > on rcu stall and will see if we can see the panic, will keep you posed 
> > > with the test results.
> > > echo 1 > /proc/sys/kernel/panic_on_rcu_stall
> > 
> > Looking forward to seeing what is going on!  Of course, to reproduce, you 
> > will need to again build with CONFIG_RCU_BOOST=n.
> > 
> > Thanx, Paul
> > 
> > > -Original Message-
> > > From: Paul E. McKenney 
> > > Sent: Saturday, December 1, 2018 12:49 AM
> > > To: He, Bo 
> > > Cc: Steven Rostedt ; 
> > > linux-kernel@vger.kernel.org; j...@joshtriplett.org; 
> > > mathieu.desnoy...@efficios.com; jiangshan...@gmail.com; Zhang, Jun 
> > > ; Xiao, Jin ; Zhang, Yanmin 
> > > 
> > > Subject: Re: rcu_preempt caused oom
> > > 
> > > On Fri, Nov 30, 2018 at 03:18:58PM +, He, Bo wrote:
> > > > Here is the kernel 

Re: rcu_preempt caused oom

2018-12-05 Thread Paul E. McKenney
On Wed, Dec 05, 2018 at 08:42:54AM +, He, Bo wrote:
> I double checked the .config, we don't enable CONFIG_NO_HZ_FULL .
> Our previous logs can dump all the task backtrace, and kthread (the 
> rcu_preempt, rcu_sched, and rcu_bh tasks) are all in "I" state not in "R" 
> state, my understandings are if it's the side-effect of causing RCU's 
> kthreads to be run at SCHED_FIFO priority 1, the kthreads should be in R 
> state.

Hmmm...  Well, the tasks could in theory be waiting on a blocking mutex.
But in practice the grace-period kthreads wait on events, so that makes
no sense.

Is it possible for you to dump out the grace-period kthread's stack,
for example, with sysreq-t?  (Steve might know a better way to do this.)

> I will do more experiments and keep you update once we have more findings:
> 1. set the kthread priority to SCHED_FIFO without CONFIG_RCU_BOOST and see if 
> the issue can reproduce.

That sounds like a most excellent experiment!

> 2. check more ftrace to double confirm why there is no 
> trace_rcu_quiescent_state_report and most of the trace_rcu_grace_period are 
> in "AccWaitCB".

As noted earlier, to see something interesting, you will need to start
the ftrace before the grace period starts.  This would probably mean
having ftrace running before starting the test.  Starting the ftrace
after the hang commences is unlikely to produce useful information.

Thanx, Paul

> -Original Message-
> From: Paul E. McKenney  
> Sent: Wednesday, December 5, 2018 3:50 AM
> To: He, Bo 
> Cc: Steven Rostedt ; linux-kernel@vger.kernel.org; 
> j...@joshtriplett.org; mathieu.desnoy...@efficios.com; 
> jiangshan...@gmail.com; Zhang, Jun ; Xiao, Jin 
> ; Zhang, Yanmin ; Bai, Jie A 
> 
> Subject: Re: rcu_preempt caused oom
> 
> On Tue, Dec 04, 2018 at 07:50:04AM +, He, Bo wrote:
> > Hi, Paul:
> > the enclosed is the log trigger the 120s hung_task_panic without other 
> > debug patches, the hung task is blocked at __wait_rcu_gp, it means the 
> > rcu_cpu_stall can't detect the scenario:
> > echo 1 > /proc/sys/kernel/panic_on_rcu_stall
> > echo 7 > /sys/module/rcupdate/parameters/rcu_cpu_stall_timeout
> 
> Not necessarily.  If there is an RCU CPU stall warning, blocking within
> __wait_rcu_gp() is expected behavior.  It is possible that the problem is 
> that although the grace period is completing as required, the callbacks are 
> not being invoked in a timely fashion.  And that could happen if you had 
> CONFIG_NO_HZ_FULL and a bunch of nohz_full CPUs, or, alternatively, callback 
> offloading enabled.  But I don't see these in your previous emails.  Another 
> possible cause is that the grace-period kthread is being delayed, so that the 
> grace period never starts.  This seems unlikely, but it is the only thing 
> thus far that matches the symptoms.
> 
> CONFIG_RCU_BOOST=y has the side-effect of causing RCU's kthreads to be run at 
> SCHED_FIFO priority 1, and that would help in the case where RCU's 
> grace-period kthread (the rcu_preempt, rcu_sched, and rcu_bh tasks, all of 
> which execute in the rcu_gp_kthread() function) was being starved of CPU time.
> 
> Does that sound likely?
> 
>   Thanx, Paul
> 
> > -Original Message-
> > From: Paul E. McKenney 
> > Sent: Monday, December 3, 2018 9:57 PM
> > To: He, Bo 
> > Cc: Steven Rostedt ; 
> > linux-kernel@vger.kernel.org; j...@joshtriplett.org; 
> > mathieu.desnoy...@efficios.com; jiangshan...@gmail.com; Zhang, Jun 
> > ; Xiao, Jin ; Zhang, Yanmin 
> > 
> > Subject: Re: rcu_preempt caused oom
> > 
> > On Mon, Dec 03, 2018 at 07:44:03AM +, He, Bo wrote:
> > > Thanks, we have run the test for the whole weekend and not reproduce the 
> > > issue,  so we confirm the CONFIG_RCU_BOOST can fix the issue.
> > 
> > Very good, that is encouraging.  Perhaps I should think about making 
> > CONFIG_RCU_BOOST=y the default for CONFIG_PREEMPT in mainline, at least for 
> > architectures for which rt_mutexes are implemented.
> > 
> > > We have enabled the rcupdate.rcu_cpu_stall_timeout=7 and also set panic 
> > > on rcu stall and will see if we can see the panic, will keep you posed 
> > > with the test results.
> > > echo 1 > /proc/sys/kernel/panic_on_rcu_stall
> > 
> > Looking forward to seeing what is going on!  Of course, to reproduce, you 
> > will need to again build with CONFIG_RCU_BOOST=n.
> > 
> > Thanx, Paul
> > 
> > > -Original Message-
> > > From: Paul E. McKenney 
> > > Sent: Saturday, December 1, 2018 12:49 AM
> > > To: He, Bo 
> > > Cc: Steven Rostedt ; 
> > > linux-kernel@vger.kernel.org; j...@joshtriplett.org; 
> > > mathieu.desnoy...@efficios.com; jiangshan...@gmail.com; Zhang, Jun 
> > > ; Xiao, Jin ; Zhang, Yanmin 
> > > 
> > > Subject: Re: rcu_preempt caused oom
> > > 
> > > On Fri, Nov 30, 2018 at 03:18:58PM +, He, Bo wrote:
> > > > Here is the kernel 

Re: [PATCH 14/14] power: supply: axp288: use the BIT() macro

2018-12-05 Thread Sebastian Reichel
Hi,

On Mon, Nov 26, 2018 at 05:27:55PM +0200, Priit Laes wrote:
> From: Olliver Schinagl 
> 
> Make use of the recommended BIT() macro for bit defines.
> 
> Signed-off-by: Olliver Schinagl 
> Signed-off-by: Priit Laes 
> ---

Thanks, queued to power-supply-next.

-- Sebastian

>  drivers/power/supply/axp288_charger.c | 35 ++--
>  1 file changed, 18 insertions(+), 17 deletions(-)
> 
> diff --git a/drivers/power/supply/axp288_charger.c 
> b/drivers/power/supply/axp288_charger.c
> index 735658e..f8c6da9 100644
> --- a/drivers/power/supply/axp288_charger.c
> +++ b/drivers/power/supply/axp288_charger.c
> @@ -16,6 +16,7 @@
>   */
>  
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -29,17 +30,17 @@
>  #include 
>  #include 
>  
> -#define PS_STAT_VBUS_TRIGGER (1 << 0)
> -#define PS_STAT_BAT_CHRG_DIR (1 << 2)
> -#define PS_STAT_VBAT_ABOVE_VHOLD (1 << 3)
> -#define PS_STAT_VBUS_VALID   (1 << 4)
> -#define PS_STAT_VBUS_PRESENT (1 << 5)
> +#define PS_STAT_VBUS_TRIGGER BIT(0)
> +#define PS_STAT_BAT_CHRG_DIR BIT(2)
> +#define PS_STAT_VBAT_ABOVE_VHOLD BIT(3)
> +#define PS_STAT_VBUS_VALID   BIT(4)
> +#define PS_STAT_VBUS_PRESENT BIT(5)
>  
> -#define CHRG_STAT_BAT_SAFE_MODE  (1 << 3)
> -#define CHRG_STAT_BAT_VALID  (1 << 4)
> -#define CHRG_STAT_BAT_PRESENT(1 << 5)
> -#define CHRG_STAT_CHARGING   (1 << 6)
> -#define CHRG_STAT_PMIC_OTP   (1 << 7)
> +#define CHRG_STAT_BAT_SAFE_MODE  BIT(3)
> +#define CHRG_STAT_BAT_VALID  BIT(4)
> +#define CHRG_STAT_BAT_PRESENTBIT(5)
> +#define CHRG_STAT_CHARGING   BIT(6)
> +#define CHRG_STAT_PMIC_OTP   BIT(7)
>  
>  #define VBUS_ISPOUT_CUR_LIM_MASK 0x03
>  #define VBUS_ISPOUT_CUR_LIM_BIT_POS  0
> @@ -52,33 +53,33 @@
>  #define VBUS_ISPOUT_VHOLD_SET_OFFSET 4000/* 4000mV */
>  #define VBUS_ISPOUT_VHOLD_SET_LSB_RES100 /* 100mV */
>  #define VBUS_ISPOUT_VHOLD_SET_4300MV 0x3 /* 4300mV */
> -#define VBUS_ISPOUT_VBUS_PATH_DIS(1 << 7)
> +#define VBUS_ISPOUT_VBUS_PATH_DISBIT(7)
>  
>  #define CHRG_CCCV_CC_MASK0xf /* 4 bits */
>  #define CHRG_CCCV_CC_BIT_POS 0
>  #define CHRG_CCCV_CC_OFFSET  200 /* 200mA */
>  #define CHRG_CCCV_CC_LSB_RES 200 /* 200mA */
> -#define CHRG_CCCV_ITERM_20P  (1 << 4)/* 20% of CC */
> +#define CHRG_CCCV_ITERM_20P  BIT(4)  /* 20% of CC */
>  #define CHRG_CCCV_CV_MASK0x60/* 2 bits */
>  #define CHRG_CCCV_CV_BIT_POS 5
>  #define CHRG_CCCV_CV_4100MV  0x0 /* 4.10V */
>  #define CHRG_CCCV_CV_4150MV  0x1 /* 4.15V */
>  #define CHRG_CCCV_CV_4200MV  0x2 /* 4.20V */
>  #define CHRG_CCCV_CV_4350MV  0x3 /* 4.35V */
> -#define CHRG_CCCV_CHG_EN (1 << 7)
> +#define CHRG_CCCV_CHG_EN BIT(7)
>  
>  #define CNTL2_CC_TIMEOUT_MASK0x3 /* 2 bits */
>  #define CNTL2_CC_TIMEOUT_OFFSET  6   /* 6 Hrs */
>  #define CNTL2_CC_TIMEOUT_LSB_RES 2   /* 2 Hrs */
>  #define CNTL2_CC_TIMEOUT_12HRS   0x3 /* 12 Hrs */
> -#define CNTL2_CHGLED_TYPEB   (1 << 4)
> -#define CNTL2_CHG_OUT_TURNON (1 << 5)
> +#define CNTL2_CHGLED_TYPEB   BIT(4)
> +#define CNTL2_CHG_OUT_TURNON BIT(5)
>  #define CNTL2_PC_TIMEOUT_MASK0xC0
>  #define CNTL2_PC_TIMEOUT_OFFSET  40  /* 40 mins */
>  #define CNTL2_PC_TIMEOUT_LSB_RES 10  /* 10 mins */
>  #define CNTL2_PC_TIMEOUT_70MINS  0x3
>  
> -#define CHRG_ILIM_TEMP_LOOP_EN   (1 << 3)
> +#define CHRG_ILIM_TEMP_LOOP_EN   BIT(3)
>  #define CHRG_VBUS_ILIM_MASK  0xf0
>  #define CHRG_VBUS_ILIM_BIT_POS   4
>  #define CHRG_VBUS_ILIM_100MA 0x0 /* 100mA */
> @@ -94,7 +95,7 @@
>  #define CHRG_VLTFC_0C0xA5/* 0 DegC */
>  #define CHRG_VHTFC_45C   0x1F/* 45 DegC */
>  
> -#define FG_CNTL_OCV_ADJ_EN   (1 << 3)
> +#define FG_CNTL_OCV_ADJ_EN   BIT(3)
>  
>  #define CV_4100MV4100/* 4100mV */
>  #define CV_4150MV4150/* 4150mV */
> -- 
> git-series 0.9.1


signature.asc
Description: PGP signature


Re: [PATCH 14/14] power: supply: axp288: use the BIT() macro

2018-12-05 Thread Sebastian Reichel
Hi,

On Mon, Nov 26, 2018 at 05:27:55PM +0200, Priit Laes wrote:
> From: Olliver Schinagl 
> 
> Make use of the recommended BIT() macro for bit defines.
> 
> Signed-off-by: Olliver Schinagl 
> Signed-off-by: Priit Laes 
> ---

Thanks, queued to power-supply-next.

-- Sebastian

>  drivers/power/supply/axp288_charger.c | 35 ++--
>  1 file changed, 18 insertions(+), 17 deletions(-)
> 
> diff --git a/drivers/power/supply/axp288_charger.c 
> b/drivers/power/supply/axp288_charger.c
> index 735658e..f8c6da9 100644
> --- a/drivers/power/supply/axp288_charger.c
> +++ b/drivers/power/supply/axp288_charger.c
> @@ -16,6 +16,7 @@
>   */
>  
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -29,17 +30,17 @@
>  #include 
>  #include 
>  
> -#define PS_STAT_VBUS_TRIGGER (1 << 0)
> -#define PS_STAT_BAT_CHRG_DIR (1 << 2)
> -#define PS_STAT_VBAT_ABOVE_VHOLD (1 << 3)
> -#define PS_STAT_VBUS_VALID   (1 << 4)
> -#define PS_STAT_VBUS_PRESENT (1 << 5)
> +#define PS_STAT_VBUS_TRIGGER BIT(0)
> +#define PS_STAT_BAT_CHRG_DIR BIT(2)
> +#define PS_STAT_VBAT_ABOVE_VHOLD BIT(3)
> +#define PS_STAT_VBUS_VALID   BIT(4)
> +#define PS_STAT_VBUS_PRESENT BIT(5)
>  
> -#define CHRG_STAT_BAT_SAFE_MODE  (1 << 3)
> -#define CHRG_STAT_BAT_VALID  (1 << 4)
> -#define CHRG_STAT_BAT_PRESENT(1 << 5)
> -#define CHRG_STAT_CHARGING   (1 << 6)
> -#define CHRG_STAT_PMIC_OTP   (1 << 7)
> +#define CHRG_STAT_BAT_SAFE_MODE  BIT(3)
> +#define CHRG_STAT_BAT_VALID  BIT(4)
> +#define CHRG_STAT_BAT_PRESENTBIT(5)
> +#define CHRG_STAT_CHARGING   BIT(6)
> +#define CHRG_STAT_PMIC_OTP   BIT(7)
>  
>  #define VBUS_ISPOUT_CUR_LIM_MASK 0x03
>  #define VBUS_ISPOUT_CUR_LIM_BIT_POS  0
> @@ -52,33 +53,33 @@
>  #define VBUS_ISPOUT_VHOLD_SET_OFFSET 4000/* 4000mV */
>  #define VBUS_ISPOUT_VHOLD_SET_LSB_RES100 /* 100mV */
>  #define VBUS_ISPOUT_VHOLD_SET_4300MV 0x3 /* 4300mV */
> -#define VBUS_ISPOUT_VBUS_PATH_DIS(1 << 7)
> +#define VBUS_ISPOUT_VBUS_PATH_DISBIT(7)
>  
>  #define CHRG_CCCV_CC_MASK0xf /* 4 bits */
>  #define CHRG_CCCV_CC_BIT_POS 0
>  #define CHRG_CCCV_CC_OFFSET  200 /* 200mA */
>  #define CHRG_CCCV_CC_LSB_RES 200 /* 200mA */
> -#define CHRG_CCCV_ITERM_20P  (1 << 4)/* 20% of CC */
> +#define CHRG_CCCV_ITERM_20P  BIT(4)  /* 20% of CC */
>  #define CHRG_CCCV_CV_MASK0x60/* 2 bits */
>  #define CHRG_CCCV_CV_BIT_POS 5
>  #define CHRG_CCCV_CV_4100MV  0x0 /* 4.10V */
>  #define CHRG_CCCV_CV_4150MV  0x1 /* 4.15V */
>  #define CHRG_CCCV_CV_4200MV  0x2 /* 4.20V */
>  #define CHRG_CCCV_CV_4350MV  0x3 /* 4.35V */
> -#define CHRG_CCCV_CHG_EN (1 << 7)
> +#define CHRG_CCCV_CHG_EN BIT(7)
>  
>  #define CNTL2_CC_TIMEOUT_MASK0x3 /* 2 bits */
>  #define CNTL2_CC_TIMEOUT_OFFSET  6   /* 6 Hrs */
>  #define CNTL2_CC_TIMEOUT_LSB_RES 2   /* 2 Hrs */
>  #define CNTL2_CC_TIMEOUT_12HRS   0x3 /* 12 Hrs */
> -#define CNTL2_CHGLED_TYPEB   (1 << 4)
> -#define CNTL2_CHG_OUT_TURNON (1 << 5)
> +#define CNTL2_CHGLED_TYPEB   BIT(4)
> +#define CNTL2_CHG_OUT_TURNON BIT(5)
>  #define CNTL2_PC_TIMEOUT_MASK0xC0
>  #define CNTL2_PC_TIMEOUT_OFFSET  40  /* 40 mins */
>  #define CNTL2_PC_TIMEOUT_LSB_RES 10  /* 10 mins */
>  #define CNTL2_PC_TIMEOUT_70MINS  0x3
>  
> -#define CHRG_ILIM_TEMP_LOOP_EN   (1 << 3)
> +#define CHRG_ILIM_TEMP_LOOP_EN   BIT(3)
>  #define CHRG_VBUS_ILIM_MASK  0xf0
>  #define CHRG_VBUS_ILIM_BIT_POS   4
>  #define CHRG_VBUS_ILIM_100MA 0x0 /* 100mA */
> @@ -94,7 +95,7 @@
>  #define CHRG_VLTFC_0C0xA5/* 0 DegC */
>  #define CHRG_VHTFC_45C   0x1F/* 45 DegC */
>  
> -#define FG_CNTL_OCV_ADJ_EN   (1 << 3)
> +#define FG_CNTL_OCV_ADJ_EN   BIT(3)
>  
>  #define CV_4100MV4100/* 4100mV */
>  #define CV_4150MV4150/* 4150mV */
> -- 
> git-series 0.9.1


signature.asc
Description: PGP signature


Re: [PATCH 13/14] power: supply: axp20x: add missing include bitops.h

2018-12-05 Thread Sebastian Reichel
Hi,

On Mon, Nov 26, 2018 at 05:27:54PM +0200, Priit Laes wrote:
> From: Olliver Schinagl 
> 
> The axp20x_usb_power driver uses BIT() operations but lacks the include
> for it. Include the bitops.h header file.
> 
> Signed-off-by: Olliver Schinagl 
> Signed-off-by: Priit Laes 
> ---
>  drivers/power/supply/axp20x_usb_power.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/power/supply/axp20x_usb_power.c 
> b/drivers/power/supply/axp20x_usb_power.c
> index 42001df..f52fe77 100644
> --- a/drivers/power/supply/axp20x_usb_power.c
> +++ b/drivers/power/supply/axp20x_usb_power.c
> @@ -10,6 +10,7 @@
>   * option) any later version.
>   */
>  
> +#include 
>  #include 
>  #include 
>  #include 
> -- 
> git-series 0.9.1

Thanks, queued to power-supply-next.

-- Sebastian


signature.asc
Description: PGP signature


Re: [PATCH 13/14] power: supply: axp20x: add missing include bitops.h

2018-12-05 Thread Sebastian Reichel
Hi,

On Mon, Nov 26, 2018 at 05:27:54PM +0200, Priit Laes wrote:
> From: Olliver Schinagl 
> 
> The axp20x_usb_power driver uses BIT() operations but lacks the include
> for it. Include the bitops.h header file.
> 
> Signed-off-by: Olliver Schinagl 
> Signed-off-by: Priit Laes 
> ---
>  drivers/power/supply/axp20x_usb_power.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/power/supply/axp20x_usb_power.c 
> b/drivers/power/supply/axp20x_usb_power.c
> index 42001df..f52fe77 100644
> --- a/drivers/power/supply/axp20x_usb_power.c
> +++ b/drivers/power/supply/axp20x_usb_power.c
> @@ -10,6 +10,7 @@
>   * option) any later version.
>   */
>  
> +#include 
>  #include 
>  #include 
>  #include 
> -- 
> git-series 0.9.1

Thanks, queued to power-supply-next.

-- Sebastian


signature.asc
Description: PGP signature


Re: [RFC PATCH 02/14] mm/hms: heterogenenous memory system (HMS) documentation

2018-12-05 Thread Logan Gunthorpe



On 2018-12-04 7:31 p.m., Jerome Glisse wrote:
> How can i express multiple link, or memory that is only accessible
> by a subset of the devices/CPUs. In today model they are back in
> assumption like everyone can access all the node which do not hold
> in what i am trying to do.

Well multiple links are easy when you have a 'link' bus. Just add
another link device under the bus.

Technically, the accessibility issue is already encoded in sysfs. For
example, through the PCI tree you can determine which ACS bits are set
and determine which devices are behind the same root bridge the same way
we do in the kernel p2pdma subsystem. This is all bus specific which is
fine, but if we want to change that, we should have a common way for
existing buses to describe these attributes in the existing tree. The
new 'link' bus devices would have to have some way to describe cases if
memory isn't accessible in some way across it.

But really, I would say the kernel is responsible for telling you when
memory is accessible to a list of initiators, so it should be part of
the checks in a theoretical hbind api. This is already the approach
p2pdma takes in-kernel: we have functions that tell you if two PCI
devices can talk to each other and we have functions to give you memory
accessible by a set of devices. What we don't have is a special tree
that p2pdma users have to walk through to determine accessibility.

In my eye's, you are just conflating a bunch of different issues that
are better solved independently in the existing frameworks we have. And
if they were tackled individually, you'd have a much easier time getting
them merged one by one.

Logan


Re: [RFC PATCH 02/14] mm/hms: heterogenenous memory system (HMS) documentation

2018-12-05 Thread Logan Gunthorpe



On 2018-12-04 7:31 p.m., Jerome Glisse wrote:
> How can i express multiple link, or memory that is only accessible
> by a subset of the devices/CPUs. In today model they are back in
> assumption like everyone can access all the node which do not hold
> in what i am trying to do.

Well multiple links are easy when you have a 'link' bus. Just add
another link device under the bus.

Technically, the accessibility issue is already encoded in sysfs. For
example, through the PCI tree you can determine which ACS bits are set
and determine which devices are behind the same root bridge the same way
we do in the kernel p2pdma subsystem. This is all bus specific which is
fine, but if we want to change that, we should have a common way for
existing buses to describe these attributes in the existing tree. The
new 'link' bus devices would have to have some way to describe cases if
memory isn't accessible in some way across it.

But really, I would say the kernel is responsible for telling you when
memory is accessible to a list of initiators, so it should be part of
the checks in a theoretical hbind api. This is already the approach
p2pdma takes in-kernel: we have functions that tell you if two PCI
devices can talk to each other and we have functions to give you memory
accessible by a set of devices. What we don't have is a special tree
that p2pdma users have to walk through to determine accessibility.

In my eye's, you are just conflating a bunch of different issues that
are better solved independently in the existing frameworks we have. And
if they were tackled individually, you'd have a much easier time getting
them merged one by one.

Logan


Re: [PATCH v2 1/2] perf cs-etm: Set branch instruction flags in packet

2018-12-05 Thread Mathieu Poirier
On Tue, 4 Dec 2018 at 23:26,  wrote:
>
> On Mon, Nov 19, 2018 at 03:26:17PM -0700, Mathieu Poirier wrote:
> > On Sun, Nov 11, 2018 at 01:07:55PM +0800, Leo Yan wrote:
> > > The perf sample data contains flags to indicate the hardware trace data
> > > is belonging to which type branch instruction, thus this can be used to
> > > print out the human readable string.  Arm CoreSight ETM sample data is
> > > missed to set flags and it is always set to zeros, this results in perf
> > > tool skips to print string for instruction types.
> > >
> > > Arm CoreSight ETM supports different kinds instruction of A64, A32 and
> > > T32; this patch is to set branch instruction flags in packet for these
> > > ISAs.
> > >
> > > The brief idea for patch implementation is describe as below:
> > >
> > > - For element with OCSD_GEN_TRC_ELEM_TRACE_ON type, it is taken as trace
> > >   beginning packet; for element with OCSD_GEN_TRC_ELEM_NO_SYNC or
> > >   OCSD_GEN_TRC_ELEM_EO_TRACE, these two kinds elements are used to set
> > >   for trace end;
> > >
> > >   As Mike suggested the packet stream might have more than one two
> > >   TRACE_ON packets, the first one TRACE_ON packet indicates trace end
> > >   and the second one is taken as trace restarting.  We will handle this
> > >   special case in the upper layer with packet queue handling, which has
> > >   more context so it's more suitable fix up for it.  This will be
> > >   accomplished in the sequential patch.
> > >
> > > - For instruction range packet, mainly base on three factors to decide
> > >   the branch instruction types:
> > >
> > >   elem->last_i_type
> > >   elem->last_i_subtype
> > >   elem->last_instr_cond
> > >
> > >   If the instruction is immediate branch but without link and return
> > >   flag, we consider it as function internal branch;  in fact the
> > >   immediate branch also can be used to invoke the function entry,
> > >   usually this is only used in assembly code to directly call a symbol
> > >   and don't expect to return back; after reviewing kernel normal
> > >   functions and user space programs, both of them are very seldom to use
> > >   immediate branch for function call.  On the other hand, if we want to
> > >   decide the immediate branch is for function branch jumping or for
> > >   function calling, we need to rely on the start address of next packet
> > >   and check the symbol offset for the start address,  this will
> > >   introduce much complexity in the implementation.  So for this version
> > >   we simply consider immediate branch as function internal branch.
> > >   Moreover, we rely on 'elem->last_instr_cond' to decide if the branch
> > >   instruction is a conditional branch or not.
> > >
> > >   If the instruction is immediate branch with link, it's instruction
> > >   'BL' and which is used for function call.
> > >
> > >   If the instruction is indirect branch and with subtype
> > >   OCSD_S_INSTR_V7_IMPLIED_RET, the decoders gives the hint the function
> > >   return for below cases related with A32/T32 instruction; set this
> > >   branch flag as function return (Thanks for Al's suggestion).
> > >
> > > BX R14
> > > MOV PC, LR
> > > POP {…, PC}
> > > LDR PC, [SP], #offset
> > >
> > >   If the instruction is indirect branch without link, this is
> > >   corresponding to instruction 'BR', this instruction usually is used
> > >   for dynamic link lib with below usage; so we think it's a return
> > >   instruction.
> > >
> > > 0680 <.plt>:
> > >  680:   a9bf7bf0stp x16, x30, [sp, #-16]!
> > >  684:   9090adrpx16, 1 <__FRAME_END__+0xf630>
> > >  688:   f947fe11ldr x17, [x16, #4088]
> > >  68c:   913fe210add x16, x16, #0xff8
> > >  690:   d61f0220br  x17
> > >
> > >   If the instruction is indirect branch with link, e.g BLR, we think
> > >   it's a function call.
> > >
> > >   For function return, ARMv8 introduces a dedicated instruction 'ret',
> > >   which has flag of OCSD_S_INSTR_V8_RET.
> > >
> > > - For exception packets, this patch divides into three types:
> > >
> > >   The first type of exception is caused by external logics like bus,
> > >   interrupt controller, debug module or PE reset or halt; this is
> > >   corresponding to flags "bcyi" which defined in doc perf-script.txt;
> > >
> > >   The second type is for system call, this is set as "bcs" by following
> > >   definition in the doc;
> > >
> > >   The third type is for CPU trap, data and instruction prefetch abort,
> > >   alignment abort; usually these exceptions are synchronous for CPU, so
> > >   set them as "bci" type.
> >
> > This is too long and needs to be broken down into pieces.  I would split 
> > this
> > patch in 3 heat, one for NO_SYNC and TRACE_ON, one for INSTR_RANGE and one 
> > for
> > ELEM_EXCEPTION/ELEM_EXCEPTION_RET.
>
> This makes sense, will split to 3 patches.
>
> > >
> > > Cc: Mathieu Poirier 
> > > Cc: Mike Leach 
> > > 

Re: [PATCH v2 1/2] perf cs-etm: Set branch instruction flags in packet

2018-12-05 Thread Mathieu Poirier
On Tue, 4 Dec 2018 at 23:26,  wrote:
>
> On Mon, Nov 19, 2018 at 03:26:17PM -0700, Mathieu Poirier wrote:
> > On Sun, Nov 11, 2018 at 01:07:55PM +0800, Leo Yan wrote:
> > > The perf sample data contains flags to indicate the hardware trace data
> > > is belonging to which type branch instruction, thus this can be used to
> > > print out the human readable string.  Arm CoreSight ETM sample data is
> > > missed to set flags and it is always set to zeros, this results in perf
> > > tool skips to print string for instruction types.
> > >
> > > Arm CoreSight ETM supports different kinds instruction of A64, A32 and
> > > T32; this patch is to set branch instruction flags in packet for these
> > > ISAs.
> > >
> > > The brief idea for patch implementation is describe as below:
> > >
> > > - For element with OCSD_GEN_TRC_ELEM_TRACE_ON type, it is taken as trace
> > >   beginning packet; for element with OCSD_GEN_TRC_ELEM_NO_SYNC or
> > >   OCSD_GEN_TRC_ELEM_EO_TRACE, these two kinds elements are used to set
> > >   for trace end;
> > >
> > >   As Mike suggested the packet stream might have more than one two
> > >   TRACE_ON packets, the first one TRACE_ON packet indicates trace end
> > >   and the second one is taken as trace restarting.  We will handle this
> > >   special case in the upper layer with packet queue handling, which has
> > >   more context so it's more suitable fix up for it.  This will be
> > >   accomplished in the sequential patch.
> > >
> > > - For instruction range packet, mainly base on three factors to decide
> > >   the branch instruction types:
> > >
> > >   elem->last_i_type
> > >   elem->last_i_subtype
> > >   elem->last_instr_cond
> > >
> > >   If the instruction is immediate branch but without link and return
> > >   flag, we consider it as function internal branch;  in fact the
> > >   immediate branch also can be used to invoke the function entry,
> > >   usually this is only used in assembly code to directly call a symbol
> > >   and don't expect to return back; after reviewing kernel normal
> > >   functions and user space programs, both of them are very seldom to use
> > >   immediate branch for function call.  On the other hand, if we want to
> > >   decide the immediate branch is for function branch jumping or for
> > >   function calling, we need to rely on the start address of next packet
> > >   and check the symbol offset for the start address,  this will
> > >   introduce much complexity in the implementation.  So for this version
> > >   we simply consider immediate branch as function internal branch.
> > >   Moreover, we rely on 'elem->last_instr_cond' to decide if the branch
> > >   instruction is a conditional branch or not.
> > >
> > >   If the instruction is immediate branch with link, it's instruction
> > >   'BL' and which is used for function call.
> > >
> > >   If the instruction is indirect branch and with subtype
> > >   OCSD_S_INSTR_V7_IMPLIED_RET, the decoders gives the hint the function
> > >   return for below cases related with A32/T32 instruction; set this
> > >   branch flag as function return (Thanks for Al's suggestion).
> > >
> > > BX R14
> > > MOV PC, LR
> > > POP {…, PC}
> > > LDR PC, [SP], #offset
> > >
> > >   If the instruction is indirect branch without link, this is
> > >   corresponding to instruction 'BR', this instruction usually is used
> > >   for dynamic link lib with below usage; so we think it's a return
> > >   instruction.
> > >
> > > 0680 <.plt>:
> > >  680:   a9bf7bf0stp x16, x30, [sp, #-16]!
> > >  684:   9090adrpx16, 1 <__FRAME_END__+0xf630>
> > >  688:   f947fe11ldr x17, [x16, #4088]
> > >  68c:   913fe210add x16, x16, #0xff8
> > >  690:   d61f0220br  x17
> > >
> > >   If the instruction is indirect branch with link, e.g BLR, we think
> > >   it's a function call.
> > >
> > >   For function return, ARMv8 introduces a dedicated instruction 'ret',
> > >   which has flag of OCSD_S_INSTR_V8_RET.
> > >
> > > - For exception packets, this patch divides into three types:
> > >
> > >   The first type of exception is caused by external logics like bus,
> > >   interrupt controller, debug module or PE reset or halt; this is
> > >   corresponding to flags "bcyi" which defined in doc perf-script.txt;
> > >
> > >   The second type is for system call, this is set as "bcs" by following
> > >   definition in the doc;
> > >
> > >   The third type is for CPU trap, data and instruction prefetch abort,
> > >   alignment abort; usually these exceptions are synchronous for CPU, so
> > >   set them as "bci" type.
> >
> > This is too long and needs to be broken down into pieces.  I would split 
> > this
> > patch in 3 heat, one for NO_SYNC and TRACE_ON, one for INSTR_RANGE and one 
> > for
> > ELEM_EXCEPTION/ELEM_EXCEPTION_RET.
>
> This makes sense, will split to 3 patches.
>
> > >
> > > Cc: Mathieu Poirier 
> > > Cc: Mike Leach 
> > > 

Re: [PATCH] firmware: qcom: scm: fix compilation error when disabled

2018-12-05 Thread Bjorn Andersson
+ linux-arm-msm

On Wed 21 Nov 18:32 PST 2018, Jonathan Marek wrote:

> This fixes the case when CONFIG_QCOM_SCM is not enabled, and linux/errno.h
> has not been included previously.
> 
> Signed-off-by: Jonathan Marek 

Reviewed-by: Bjorn Andersson 

Regards,
Bjorn

> ---
> drm/msm includes qcom_scm.h, but can be used with non-qcom hw (IMX5)
> 
>  include/linux/qcom_scm.h | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/include/linux/qcom_scm.h b/include/linux/qcom_scm.h
> index 06996ad4f2bc..1637385bcc17 100644
> --- a/include/linux/qcom_scm.h
> +++ b/include/linux/qcom_scm.h
> @@ -67,6 +67,9 @@ extern int qcom_scm_iommu_secure_ptbl_init(u64 addr, u32 
> size, u32 spare);
>  extern int qcom_scm_io_readl(phys_addr_t addr, unsigned int *val);
>  extern int qcom_scm_io_writel(phys_addr_t addr, unsigned int val);
>  #else
> +
> +#include 
> +
>  static inline
>  int qcom_scm_set_cold_boot_addr(void *entry, const cpumask_t *cpus)
>  {
> -- 
> 2.17.1
> 


Re: [PATCH] firmware: qcom: scm: fix compilation error when disabled

2018-12-05 Thread Bjorn Andersson
+ linux-arm-msm

On Wed 21 Nov 18:32 PST 2018, Jonathan Marek wrote:

> This fixes the case when CONFIG_QCOM_SCM is not enabled, and linux/errno.h
> has not been included previously.
> 
> Signed-off-by: Jonathan Marek 

Reviewed-by: Bjorn Andersson 

Regards,
Bjorn

> ---
> drm/msm includes qcom_scm.h, but can be used with non-qcom hw (IMX5)
> 
>  include/linux/qcom_scm.h | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/include/linux/qcom_scm.h b/include/linux/qcom_scm.h
> index 06996ad4f2bc..1637385bcc17 100644
> --- a/include/linux/qcom_scm.h
> +++ b/include/linux/qcom_scm.h
> @@ -67,6 +67,9 @@ extern int qcom_scm_iommu_secure_ptbl_init(u64 addr, u32 
> size, u32 spare);
>  extern int qcom_scm_io_readl(phys_addr_t addr, unsigned int *val);
>  extern int qcom_scm_io_writel(phys_addr_t addr, unsigned int val);
>  #else
> +
> +#include 
> +
>  static inline
>  int qcom_scm_set_cold_boot_addr(void *entry, const cpumask_t *cpus)
>  {
> -- 
> 2.17.1
> 


Re: [PATCH 10/12] staging: rtl8188eu: correct indentation in update_wireless_mode()

2018-12-05 Thread Joe Perches
On Wed, 2018-12-05 at 18:02 +0100, Michael Straube wrote:
> Correct indentation in update_wireless_mode() to clear a checkpatch
> warning. WARNING: Statements should start on a tabstop
[]
> diff --git a/drivers/staging/rtl8188eu/core/rtw_wlan_util.c 
> b/drivers/staging/rtl8188eu/core/rtw_wlan_util.c
[]
> @@ -1442,7 +1442,7 @@ void update_wireless_mode(struct adapter *padapter)
>  
>   if (pmlmeext->cur_wireless_mode & WIRELESS_11B)
>   update_mgnt_tx_rate(padapter, IEEE80211_CCK_RATE_1MB);
> -  else
> + else
>   update_mgnt_tx_rate(padapter, IEEE80211_OFDM_RATE_6MB);
>  }
.
gcc generally emits smaller code by using a single
call and a ternary.

$ size drivers/staging/rtl8188eu/core/rtw_wlan_util.o*
   textdata bss dec hex filename
  14974  36   0   150103aa2 
drivers/staging/rtl8188eu/core/rtw_wlan_util.o.new
  14990  36   0   150263ab2 
drivers/staging/rtl8188eu/core/rtw_wlan_util.o.old
---
 drivers/staging/rtl8188eu/core/rtw_wlan_util.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/staging/rtl8188eu/core/rtw_wlan_util.c 
b/drivers/staging/rtl8188eu/core/rtw_wlan_util.c
index 24918223499b..acad30f53f5d 100644
--- a/drivers/staging/rtl8188eu/core/rtw_wlan_util.c
+++ b/drivers/staging/rtl8188eu/core/rtw_wlan_util.c
@@ -1440,10 +1440,9 @@ void update_wireless_mode(struct adapter *padapter)
 
rtw_hal_set_hwreg(padapter, HW_VAR_RESP_SIFS,  (u8 *)_Timer);
 
-   if (pmlmeext->cur_wireless_mode & WIRELESS_11B)
-   update_mgnt_tx_rate(padapter, IEEE80211_CCK_RATE_1MB);
-else
-   update_mgnt_tx_rate(padapter, IEEE80211_OFDM_RATE_6MB);
+   update_mgnt_tx_rate(padapter,
+   pmlmeext->cur_wireless_mode & WIRELESS_11B ?
+   IEEE80211_CCK_RATE_1MB : IEEE80211_OFDM_RATE_6MB);
 }
 
 void update_bmc_sta_support_rate(struct adapter *padapter, u32 mac_id)




Re: [PATCH 10/12] staging: rtl8188eu: correct indentation in update_wireless_mode()

2018-12-05 Thread Joe Perches
On Wed, 2018-12-05 at 18:02 +0100, Michael Straube wrote:
> Correct indentation in update_wireless_mode() to clear a checkpatch
> warning. WARNING: Statements should start on a tabstop
[]
> diff --git a/drivers/staging/rtl8188eu/core/rtw_wlan_util.c 
> b/drivers/staging/rtl8188eu/core/rtw_wlan_util.c
[]
> @@ -1442,7 +1442,7 @@ void update_wireless_mode(struct adapter *padapter)
>  
>   if (pmlmeext->cur_wireless_mode & WIRELESS_11B)
>   update_mgnt_tx_rate(padapter, IEEE80211_CCK_RATE_1MB);
> -  else
> + else
>   update_mgnt_tx_rate(padapter, IEEE80211_OFDM_RATE_6MB);
>  }
.
gcc generally emits smaller code by using a single
call and a ternary.

$ size drivers/staging/rtl8188eu/core/rtw_wlan_util.o*
   textdata bss dec hex filename
  14974  36   0   150103aa2 
drivers/staging/rtl8188eu/core/rtw_wlan_util.o.new
  14990  36   0   150263ab2 
drivers/staging/rtl8188eu/core/rtw_wlan_util.o.old
---
 drivers/staging/rtl8188eu/core/rtw_wlan_util.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/staging/rtl8188eu/core/rtw_wlan_util.c 
b/drivers/staging/rtl8188eu/core/rtw_wlan_util.c
index 24918223499b..acad30f53f5d 100644
--- a/drivers/staging/rtl8188eu/core/rtw_wlan_util.c
+++ b/drivers/staging/rtl8188eu/core/rtw_wlan_util.c
@@ -1440,10 +1440,9 @@ void update_wireless_mode(struct adapter *padapter)
 
rtw_hal_set_hwreg(padapter, HW_VAR_RESP_SIFS,  (u8 *)_Timer);
 
-   if (pmlmeext->cur_wireless_mode & WIRELESS_11B)
-   update_mgnt_tx_rate(padapter, IEEE80211_CCK_RATE_1MB);
-else
-   update_mgnt_tx_rate(padapter, IEEE80211_OFDM_RATE_6MB);
+   update_mgnt_tx_rate(padapter,
+   pmlmeext->cur_wireless_mode & WIRELESS_11B ?
+   IEEE80211_CCK_RATE_1MB : IEEE80211_OFDM_RATE_6MB);
 }
 
 void update_bmc_sta_support_rate(struct adapter *padapter, u32 mac_id)




Re: [PATCH 1/8] perf: Allow to block process in syscall tracepoints

2018-12-05 Thread Steven Rostedt
On Wed,  5 Dec 2018 17:05:02 +0100
Jiri Olsa  wrote:

> diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c
> index 3b2490b81918..e55cf9169a03 100644
> --- a/arch/x86/entry/common.c
> +++ b/arch/x86/entry/common.c
> @@ -60,6 +60,32 @@ static void do_audit_syscall_entry(struct pt_regs *regs, 
> u32 arch)
>   }
>  }
>  
> +static void trace_block_syscall(struct pt_regs *regs, bool enter)
> +{
> + current->perf_blocked = true;
> +
> + do {
> + schedule_timeout(100 * HZ);
> + current->perf_blocked_cnt = 0;
> +
> + if (enter) {
> + /* perf syscalls:* enter */
> + perf_trace_syscall_enter(regs);
> +
> + /* perf raw_syscalls:* enter */
> + perf_trace_sys_enter(_sys_enter, regs, 
> regs->orig_ax);
> + } else {
> + /* perf syscalls:* enter */
> + perf_trace_syscall_exit(regs);
> +
> + /* perf raw_syscalls:* enter */
> + perf_trace_sys_exit(_sys_exit, regs, regs->ax);
> + }
> + } while (current->perf_blocked_cnt);

I was thinking, if the process reading the perf buffer dies, how do we
tell this task to continue on?

-- Steve

> +
> + current->perf_blocked = false;
> +}
> +


Re: [PATCH 1/8] perf: Allow to block process in syscall tracepoints

2018-12-05 Thread Steven Rostedt
On Wed,  5 Dec 2018 17:05:02 +0100
Jiri Olsa  wrote:

> diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c
> index 3b2490b81918..e55cf9169a03 100644
> --- a/arch/x86/entry/common.c
> +++ b/arch/x86/entry/common.c
> @@ -60,6 +60,32 @@ static void do_audit_syscall_entry(struct pt_regs *regs, 
> u32 arch)
>   }
>  }
>  
> +static void trace_block_syscall(struct pt_regs *regs, bool enter)
> +{
> + current->perf_blocked = true;
> +
> + do {
> + schedule_timeout(100 * HZ);
> + current->perf_blocked_cnt = 0;
> +
> + if (enter) {
> + /* perf syscalls:* enter */
> + perf_trace_syscall_enter(regs);
> +
> + /* perf raw_syscalls:* enter */
> + perf_trace_sys_enter(_sys_enter, regs, 
> regs->orig_ax);
> + } else {
> + /* perf syscalls:* enter */
> + perf_trace_syscall_exit(regs);
> +
> + /* perf raw_syscalls:* enter */
> + perf_trace_sys_exit(_sys_exit, regs, regs->ax);
> + }
> + } while (current->perf_blocked_cnt);

I was thinking, if the process reading the perf buffer dies, how do we
tell this task to continue on?

-- Steve

> +
> + current->perf_blocked = false;
> +}
> +


Re: [PATCH] perf, tools: Support srccode output

2018-12-05 Thread Andi Kleen
On Wed, Dec 05, 2018 at 01:28:38PM +0100, Jiri Olsa wrote:
> On Mon, Dec 03, 2018 at 04:18:48PM -0800, Andi Kleen wrote:
> > From: Andi Kleen 
> > 
> > When looking at PT or brstackinsn traces with perf script
> > it can be very useful to see the source code. This adds a simple
> > facility to print them with perf script, if the information
> > is available through dwarf
> > 
> > % perf record ...
> > % perf script -F insn,ip,sym,srccode
> > ...
> > 
> >   4004c6 main
> > 5   for (i = 0; i < 1000; i++)
> >4004cd main
> > 5   for (i = 0; i < 1000; i++)
> >4004c6 main
> > 5   for (i = 0; i < 1000; i++)
> >4004cd main
> > 5   for (i = 0; i < 1000; i++)
> >4004cd main
> > 5   for (i = 0; i < 1000; i++)
> >4004cd main
> > 5   for (i = 0; i < 1000; i++)
> >4004cd main
> > 5   for (i = 0; i < 1000; i++)
> >4004cd main
> > 5   for (i = 0; i < 1000; i++)
> >4004b3 main
> > 6   v++;
> > 
> > % perf record -b ...
> > % perf script -F insn,ip,sym,srccode,brstackinsn
> > 
> > ...
> >main+22:
> > 00400543insn: e8 ca ff ff ff# PRED
> > |18 f1();
> > f1:
> > 00400512insn: 55
> > |10   {
> > 00400513insn: 48 89 e5
> > 00400516insn: b8 00 00 00 00
> > |11 f2();
> > 0040051binsn: e8 d6 ff ff ff# PRED
> > f2:
> > 004004f6insn: 55
> > |5{
> > 004004f7insn: 48 89 e5
> > 004004fainsn: 8b 05 2c 0b 20 00
> > |6  c = a / b;
> > 00400500insn: 8b 0d 2a 0b 20 00
> > 00400506insn: 99
> > 00400507insn: f7 f9
> > 00400509insn: 89 05 29 0b 20 00
> > 0040050finsn: 90
> > |7}
> > 00400510insn: 5d
> > 00400511insn: c3# PRED
> > f1+14:
> > 00400520insn: b8 00 00 00 00
> > |12 f2();
> > 00400525insn: e8 cc ff ff ff# PRED
> > f2:
> > 004004f6insn: 55
> > |5{
> > 004004f7insn: 48 89 e5
> > 004004fainsn: 8b 05 2c 0b 20 00
> > |6  c = a / b;
> > 
> > Not supported for callchains currently, would need some
> > layout changes there.
> 
> nice, works nicely, especialy with --xed
> 
> Acked-by: Jiri Olsa 
> 
> I need to check, but is there a way to make this work with
> perf-with-kcore script? looks like we endup with kcore and
> won't allow vmlinux lookups for debug info

Hmm, probably not currently. Would somehow need to point
to the debuginfo. Perhaps if /proc/kcore was fixed
to report the debugid, and then could find it this way.

Or we could add an option to perf report to add a vmlinux
only for debug info that is not used for reading the code.

-andi


Re: [PATCH] perf, tools: Support srccode output

2018-12-05 Thread Andi Kleen
On Wed, Dec 05, 2018 at 01:28:38PM +0100, Jiri Olsa wrote:
> On Mon, Dec 03, 2018 at 04:18:48PM -0800, Andi Kleen wrote:
> > From: Andi Kleen 
> > 
> > When looking at PT or brstackinsn traces with perf script
> > it can be very useful to see the source code. This adds a simple
> > facility to print them with perf script, if the information
> > is available through dwarf
> > 
> > % perf record ...
> > % perf script -F insn,ip,sym,srccode
> > ...
> > 
> >   4004c6 main
> > 5   for (i = 0; i < 1000; i++)
> >4004cd main
> > 5   for (i = 0; i < 1000; i++)
> >4004c6 main
> > 5   for (i = 0; i < 1000; i++)
> >4004cd main
> > 5   for (i = 0; i < 1000; i++)
> >4004cd main
> > 5   for (i = 0; i < 1000; i++)
> >4004cd main
> > 5   for (i = 0; i < 1000; i++)
> >4004cd main
> > 5   for (i = 0; i < 1000; i++)
> >4004cd main
> > 5   for (i = 0; i < 1000; i++)
> >4004b3 main
> > 6   v++;
> > 
> > % perf record -b ...
> > % perf script -F insn,ip,sym,srccode,brstackinsn
> > 
> > ...
> >main+22:
> > 00400543insn: e8 ca ff ff ff# PRED
> > |18 f1();
> > f1:
> > 00400512insn: 55
> > |10   {
> > 00400513insn: 48 89 e5
> > 00400516insn: b8 00 00 00 00
> > |11 f2();
> > 0040051binsn: e8 d6 ff ff ff# PRED
> > f2:
> > 004004f6insn: 55
> > |5{
> > 004004f7insn: 48 89 e5
> > 004004fainsn: 8b 05 2c 0b 20 00
> > |6  c = a / b;
> > 00400500insn: 8b 0d 2a 0b 20 00
> > 00400506insn: 99
> > 00400507insn: f7 f9
> > 00400509insn: 89 05 29 0b 20 00
> > 0040050finsn: 90
> > |7}
> > 00400510insn: 5d
> > 00400511insn: c3# PRED
> > f1+14:
> > 00400520insn: b8 00 00 00 00
> > |12 f2();
> > 00400525insn: e8 cc ff ff ff# PRED
> > f2:
> > 004004f6insn: 55
> > |5{
> > 004004f7insn: 48 89 e5
> > 004004fainsn: 8b 05 2c 0b 20 00
> > |6  c = a / b;
> > 
> > Not supported for callchains currently, would need some
> > layout changes there.
> 
> nice, works nicely, especialy with --xed
> 
> Acked-by: Jiri Olsa 
> 
> I need to check, but is there a way to make this work with
> perf-with-kcore script? looks like we endup with kcore and
> won't allow vmlinux lookups for debug info

Hmm, probably not currently. Would somehow need to point
to the debuginfo. Perhaps if /proc/kcore was fixed
to report the debugid, and then could find it this way.

Or we could add an option to perf report to add a vmlinux
only for debug info that is not used for reading the code.

-andi


Re: [RFC PATCH 02/14] mm/hms: heterogenenous memory system (HMS) documentation

2018-12-05 Thread Logan Gunthorpe



On 2018-12-04 7:37 p.m., Jerome Glisse wrote:
>>
>> This came up before for apis even better defined than HMS as well as
>> more limited scope, i.e. experimental ABI availability only for -rc
>> kernels. Linus said this:
>>
>> "There are no loopholes. No "but it's been only one release". No, no,
>> no. The whole point is that users are supposed to be able to *trust*
>> the kernel. If we do something, we keep on doing it.
>>
>> And if it makes it harder to add new user-visible interfaces, then
>> that's a *good* thing." [1]
>>
>> The takeaway being don't land work-in-progress ABIs in the kernel.
>> Once an application depends on it, there are no more incompatible
>> changes possible regardless of the warnings, experimental notices, or
>> "staging" designation. DAX is experimental because there are cases
>> where it currently does not work with respect to another kernel
>> feature like xfs-reflink, RDMA. The plan is to fix those, not continue
>> to hide behind an experimental designation, and fix them in a way that
>> preserves the user visible behavior that has already been exposed,
>> i.e. no regressions.
>>
>> [1]: 
>> https://lists.linuxfoundation.org/pipermail/ksummit-discuss/2017-August/004742.html
> 
> So i guess i am heading down the vXX road ... such is my life :)

I recommend against it. I really haven't been convinced by any of your
arguments for having a second topology tree. The existing topology tree
in sysfs already better describes the links between hardware right now,
except for the missing GPU links (and those should be addressable within
the GPU community). Plus, maybe, some other enhancements to sockets/numa
node descriptions if there's something missing there.

Then, 'hbind' is another issue but I suspect it would be better
implemented as an ioctl on existing GPU interfaces. I certainly can't
see any benefit in using it myself.

It's better to take an approach that would be less controversial with
the community than to brow beat them with a patch set 20+ times until
they take it.

Logan



Re: [RFC PATCH 02/14] mm/hms: heterogenenous memory system (HMS) documentation

2018-12-05 Thread Logan Gunthorpe



On 2018-12-04 7:37 p.m., Jerome Glisse wrote:
>>
>> This came up before for apis even better defined than HMS as well as
>> more limited scope, i.e. experimental ABI availability only for -rc
>> kernels. Linus said this:
>>
>> "There are no loopholes. No "but it's been only one release". No, no,
>> no. The whole point is that users are supposed to be able to *trust*
>> the kernel. If we do something, we keep on doing it.
>>
>> And if it makes it harder to add new user-visible interfaces, then
>> that's a *good* thing." [1]
>>
>> The takeaway being don't land work-in-progress ABIs in the kernel.
>> Once an application depends on it, there are no more incompatible
>> changes possible regardless of the warnings, experimental notices, or
>> "staging" designation. DAX is experimental because there are cases
>> where it currently does not work with respect to another kernel
>> feature like xfs-reflink, RDMA. The plan is to fix those, not continue
>> to hide behind an experimental designation, and fix them in a way that
>> preserves the user visible behavior that has already been exposed,
>> i.e. no regressions.
>>
>> [1]: 
>> https://lists.linuxfoundation.org/pipermail/ksummit-discuss/2017-August/004742.html
> 
> So i guess i am heading down the vXX road ... such is my life :)

I recommend against it. I really haven't been convinced by any of your
arguments for having a second topology tree. The existing topology tree
in sysfs already better describes the links between hardware right now,
except for the missing GPU links (and those should be addressable within
the GPU community). Plus, maybe, some other enhancements to sockets/numa
node descriptions if there's something missing there.

Then, 'hbind' is another issue but I suspect it would be better
implemented as an ioctl on existing GPU interfaces. I certainly can't
see any benefit in using it myself.

It's better to take an approach that would be less controversial with
the community than to brow beat them with a patch set 20+ times until
they take it.

Logan



Re: [PATCH 4.19 000/139] 4.19.7-stable review

2018-12-05 Thread Dan Rue
On Wed, Dec 05, 2018 at 02:44:59PM -0200, Rafael David Tinoco wrote:
> On 12/4/18 8:48 AM, Greg Kroah-Hartman wrote:
> > This is the start of the stable review cycle for the 4.19.7 release.
> > There are 139 patches in this series, all will be posted as a response
> > to this one.  If anyone has any issues with these being applied, please
> > let me know.
> > 
> > Responses should be made by Thu Dec  6 10:36:22 UTC 2018.
> > Anything received after that time might be too late.
> > 
> > The whole patch series can be found in one patch at:
> > 
> > https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.19.7-rc1.gz
> > or in the git tree and branch at:
> > 
> > git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git 
> > linux-4.19.y
> > and the diffstat can be found below.
> > 
> > thanks,
> > 
> > greg k-h
> 
> 
> Note: As a consequence of the retpoline backport, we upgraded our
> toolchain that is used to build the kernel and userspace from 7.0 to
> 7.3 (containing retpoline support).

Specifically, we went from Linaro's gcc version 7.1.1 20170707 (Linaro
GCC 7.1-2017.08) to upstream's gcc version 7.3.0 (GCC).

> 
> Results from Linaro’s test farm.
> No regressions on arm64, arm, x86_64, and i386.
> 
> Summary
> 
> 
> kernel: 4.19.7-rc1
> git repo:
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
> git branch: linux-4.19.y
> git commit: 987a6da5152c29e37cc11de9a2d10a23a48015c9
> git describe: v4.19.6-140-g987a6da5152c
> Test details:
> https://qa-reports.linaro.org/lkft/linux-stable-rc-4.19-oe/build/v4.19.6-140-g987a6da5152c
> 
> 
> No Regressions (compared to build v4.19.6)
> 
> 
> No Fixes (compared to build v4.19.6)
> 
> 
> Ran 20285 total tests in the following environments and test suites.
> 
> Environments
> --
> - dragonboard-410c - arm64
> - hi6220-hikey - arm64
> - i386
> - juno-r2 - arm64
> - qemu_arm
> - qemu_arm64
> - qemu_i386
> - qemu_x86_64
> - x15 - arm
> - x86_64
> 
> Test Suites
> ---
> * boot
> * install-android-platform-tools-r2600
> * kselftest
> * libhugetlbfs
> * ltp-cap_bounds-tests
> * ltp-containers-tests
> * ltp-cve-tests
> * ltp-fcntl-locktests-tests
> * ltp-filecaps-tests
> * ltp-fs-tests
> * ltp-fs_bind-tests
> * ltp-fs_perms_simple-tests
> * ltp-fsx-tests
> * ltp-hugetlb-tests
> * ltp-io-tests
> * ltp-ipc-tests
> * ltp-math-tests
> * ltp-nptl-tests
> * ltp-pty-tests
> * ltp-sched-tests
> * ltp-securebits-tests
> * ltp-syscalls-tests
> * ltp-timers-tests
> * ltp-open-posix-tests
> * kselftest-vsyscall-mode-native
> * kselftest-vsyscall-mode-none
> 
> --
> Linaro LKFT
> https://lkft.linaro.org


Re: [PATCH 4.19 000/139] 4.19.7-stable review

2018-12-05 Thread Dan Rue
On Wed, Dec 05, 2018 at 02:44:59PM -0200, Rafael David Tinoco wrote:
> On 12/4/18 8:48 AM, Greg Kroah-Hartman wrote:
> > This is the start of the stable review cycle for the 4.19.7 release.
> > There are 139 patches in this series, all will be posted as a response
> > to this one.  If anyone has any issues with these being applied, please
> > let me know.
> > 
> > Responses should be made by Thu Dec  6 10:36:22 UTC 2018.
> > Anything received after that time might be too late.
> > 
> > The whole patch series can be found in one patch at:
> > 
> > https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.19.7-rc1.gz
> > or in the git tree and branch at:
> > 
> > git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git 
> > linux-4.19.y
> > and the diffstat can be found below.
> > 
> > thanks,
> > 
> > greg k-h
> 
> 
> Note: As a consequence of the retpoline backport, we upgraded our
> toolchain that is used to build the kernel and userspace from 7.0 to
> 7.3 (containing retpoline support).

Specifically, we went from Linaro's gcc version 7.1.1 20170707 (Linaro
GCC 7.1-2017.08) to upstream's gcc version 7.3.0 (GCC).

> 
> Results from Linaro’s test farm.
> No regressions on arm64, arm, x86_64, and i386.
> 
> Summary
> 
> 
> kernel: 4.19.7-rc1
> git repo:
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
> git branch: linux-4.19.y
> git commit: 987a6da5152c29e37cc11de9a2d10a23a48015c9
> git describe: v4.19.6-140-g987a6da5152c
> Test details:
> https://qa-reports.linaro.org/lkft/linux-stable-rc-4.19-oe/build/v4.19.6-140-g987a6da5152c
> 
> 
> No Regressions (compared to build v4.19.6)
> 
> 
> No Fixes (compared to build v4.19.6)
> 
> 
> Ran 20285 total tests in the following environments and test suites.
> 
> Environments
> --
> - dragonboard-410c - arm64
> - hi6220-hikey - arm64
> - i386
> - juno-r2 - arm64
> - qemu_arm
> - qemu_arm64
> - qemu_i386
> - qemu_x86_64
> - x15 - arm
> - x86_64
> 
> Test Suites
> ---
> * boot
> * install-android-platform-tools-r2600
> * kselftest
> * libhugetlbfs
> * ltp-cap_bounds-tests
> * ltp-containers-tests
> * ltp-cve-tests
> * ltp-fcntl-locktests-tests
> * ltp-filecaps-tests
> * ltp-fs-tests
> * ltp-fs_bind-tests
> * ltp-fs_perms_simple-tests
> * ltp-fsx-tests
> * ltp-hugetlb-tests
> * ltp-io-tests
> * ltp-ipc-tests
> * ltp-math-tests
> * ltp-nptl-tests
> * ltp-pty-tests
> * ltp-sched-tests
> * ltp-securebits-tests
> * ltp-syscalls-tests
> * ltp-timers-tests
> * ltp-open-posix-tests
> * kselftest-vsyscall-mode-native
> * kselftest-vsyscall-mode-none
> 
> --
> Linaro LKFT
> https://lkft.linaro.org


Re: [PATCH v3] signal: add procfd_send_signal() syscall

2018-12-05 Thread kbuild test robot
Hi Christian,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on linus/master]
[also build test WARNING on v4.20-rc5]
[cannot apply to next-20181205]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Christian-Brauner/signal-add-procfd_send_signal-syscall/20181205-174512
config: sh-titan_defconfig (attached as .config)
compiler: sh4-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
GCC_VERSION=7.2.0 make.cross ARCH=sh 

All warnings (new ones prefixed by >>):

   :1317:2: warning: #warning syscall pkey_mprotect not implemented 
[-Wcpp]
   :1320:2: warning: #warning syscall pkey_alloc not implemented [-Wcpp]
   :1323:2: warning: #warning syscall pkey_free not implemented [-Wcpp]
   :1326:2: warning: #warning syscall statx not implemented [-Wcpp]
   :1332:2: warning: #warning syscall io_pgetevents not implemented 
[-Wcpp]
   :1335:2: warning: #warning syscall rseq not implemented [-Wcpp]
>> :1338:2: warning: #warning syscall procfd_send_signal not implemented 
>> [-Wcpp]

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip


Re: [PATCH v3] signal: add procfd_send_signal() syscall

2018-12-05 Thread kbuild test robot
Hi Christian,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on linus/master]
[also build test WARNING on v4.20-rc5]
[cannot apply to next-20181205]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Christian-Brauner/signal-add-procfd_send_signal-syscall/20181205-174512
config: sh-titan_defconfig (attached as .config)
compiler: sh4-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
GCC_VERSION=7.2.0 make.cross ARCH=sh 

All warnings (new ones prefixed by >>):

   :1317:2: warning: #warning syscall pkey_mprotect not implemented 
[-Wcpp]
   :1320:2: warning: #warning syscall pkey_alloc not implemented [-Wcpp]
   :1323:2: warning: #warning syscall pkey_free not implemented [-Wcpp]
   :1326:2: warning: #warning syscall statx not implemented [-Wcpp]
   :1332:2: warning: #warning syscall io_pgetevents not implemented 
[-Wcpp]
   :1335:2: warning: #warning syscall rseq not implemented [-Wcpp]
>> :1338:2: warning: #warning syscall procfd_send_signal not implemented 
>> [-Wcpp]

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip


Re: [PATCH V2 5/5] PM / Domains: Propagate performance state updates

2018-12-05 Thread Ulf Hansson
On Wed, 5 Dec 2018 at 07:42, Stephen Boyd  wrote:
>
> Quoting Ulf Hansson (2018-12-03 05:38:35)
> > + Stephen, Mike, Graham
> >
> > On Fri, 30 Nov 2018 at 12:06, Viresh Kumar  wrote:
> > >
> > > On 30-11-18, 11:18, Ulf Hansson wrote:
> > There is one a big difference while comparing with clocks, which make
> > this more difficult.
> >
> > That is, in dev_pm_genpd_set_performance_state(), we are *not* calling
> > ->the set_performance_state() callback of the genpd, unless the genpd
> > is already powered on. Instead, for that case, we are only aggregating
> > the performance states votes, to defer to invoke
> > ->set_performance_state() until the genpd becomes powered on. In some
> > way this makes sense, but for clock_set_rate(), the clock's rate can
> > be changed, no matter if the clock has been prepared/enabled or not.
> >
> > I recall we discussed this behavior of genpd, while introducing the
> > performance states support to it. Reaching this point, introducing the
> > master-domain propagation of performance states votes, we may need to
> > re-consider the behavior, as there is evidently an overhead that grows
> > along with the hierarchy.
> >
> > As a matter of fact, what I think this boils to, is to consider if we
> > want to temporary drop the performance state vote for a device from
> > genpd's ->runtime_suspend() callback. Thus, also restore the vote from
> > genpd's ->runtime_resume() callback. That's because, this is directly
> > related to whether genpd should care about whether it's powered on or
> > off, when calling the ->set_performance_state(). We have had
> > discussions at LKML already around this topic. It seems like we need
> > to pick them up to reach a consensus, before we can move forward with
> > this.
>
> What are we trying to gain consensus on exactly?

Apologize for not being more clear. Let me re-phrase the question.

Should genpd (via genpd's internal runtime suspend callback) reset the
performance state for a device to zero, when that device becomes
runtime suspended?

Of course, if such reset is done, then genpd should also re-compute
the aggregation of the performance states for the corresponding PM
domain (and its master domains). Consequentially, these operations
then needs to be reversed, when the device in question becomes runtime
resumed.

>
> I've been thinking that we would want there to be a performance state
> 'target' for the 'devices are runtime suspended' state that we apply
> when the genpd is powered down from runtime PM suspend of all devices in
> the domain. This would then be overwritten with whatever the aggregated
> performance state is when the genpd is powered back on due to some
> device runtime resuming.

>From the above, I assume "target" to not always be zero. That confirms
my understanding from earlier discussions.

That is sort of what we already have today. The consumer driver are
free to change the performance state, via
dev_pm_genpd_set_performance_state() (or indirectly via
dev_pm_opp_set_rate()), for its device at any time. For example, it
may want to set a new performance state at runtime suspend.

> Don't we already sort of have support for this
> right now in genpd with commit fc5cbf0c94b6 ("PM / Domains: Support for
> multiple states")? So I would think that we either let that state_idx
> member be 0 or some implementation defined 'off' state index of the
> genpd that can be achieved.

Well, implementation details, but thanks for sharing your ideas.

> I was also thinking that the genpd_power_state was more than just a
> bunch of idle states of a device so that we could combine
> on/off/idle/active(performance states?) into one genpd_power_state
> space that is affected by idle and on/off runtime PM status.
>
> In the Qualcomm use-case, the state of the clk, either on or off, is
> factored into the decision to consider the voltage constraint that the
> clk has on the CX domain. So when the clk is disabled, the voltage
> constraint on CX can be removed/ignored in the aggregation equation
> because the hardware isn't clocking. This needs to work properly so that
> devices can disable their clks and save power.
>
> I hope that we can move clk on/off/rate control to be implemented with
> clk domains based upon genpds so that we can generically reason about
> genpds being on/off and at some performance state (i.e. a frequency)
> instead of making something clk specific in the genpd code. If we can do
> that, then this can be stated as a slave genpd (the clk domain) that's
> linked to a master genpd (the voltage/corner domain) can only affect the
> performance state requirement of the master genpd when the slave genpd
> is on. When the slave genpd is off, the performance state requirement is
> dropped and the master domain settles to a lower requirement based on
> the other domains linked to it that are on. The clk domain would turn on
> and off when the set of devices it is attached to are all suspended and
> that would "do the right thing" 

Re: [PATCH V2 5/5] PM / Domains: Propagate performance state updates

2018-12-05 Thread Ulf Hansson
On Wed, 5 Dec 2018 at 07:42, Stephen Boyd  wrote:
>
> Quoting Ulf Hansson (2018-12-03 05:38:35)
> > + Stephen, Mike, Graham
> >
> > On Fri, 30 Nov 2018 at 12:06, Viresh Kumar  wrote:
> > >
> > > On 30-11-18, 11:18, Ulf Hansson wrote:
> > There is one a big difference while comparing with clocks, which make
> > this more difficult.
> >
> > That is, in dev_pm_genpd_set_performance_state(), we are *not* calling
> > ->the set_performance_state() callback of the genpd, unless the genpd
> > is already powered on. Instead, for that case, we are only aggregating
> > the performance states votes, to defer to invoke
> > ->set_performance_state() until the genpd becomes powered on. In some
> > way this makes sense, but for clock_set_rate(), the clock's rate can
> > be changed, no matter if the clock has been prepared/enabled or not.
> >
> > I recall we discussed this behavior of genpd, while introducing the
> > performance states support to it. Reaching this point, introducing the
> > master-domain propagation of performance states votes, we may need to
> > re-consider the behavior, as there is evidently an overhead that grows
> > along with the hierarchy.
> >
> > As a matter of fact, what I think this boils to, is to consider if we
> > want to temporary drop the performance state vote for a device from
> > genpd's ->runtime_suspend() callback. Thus, also restore the vote from
> > genpd's ->runtime_resume() callback. That's because, this is directly
> > related to whether genpd should care about whether it's powered on or
> > off, when calling the ->set_performance_state(). We have had
> > discussions at LKML already around this topic. It seems like we need
> > to pick them up to reach a consensus, before we can move forward with
> > this.
>
> What are we trying to gain consensus on exactly?

Apologize for not being more clear. Let me re-phrase the question.

Should genpd (via genpd's internal runtime suspend callback) reset the
performance state for a device to zero, when that device becomes
runtime suspended?

Of course, if such reset is done, then genpd should also re-compute
the aggregation of the performance states for the corresponding PM
domain (and its master domains). Consequentially, these operations
then needs to be reversed, when the device in question becomes runtime
resumed.

>
> I've been thinking that we would want there to be a performance state
> 'target' for the 'devices are runtime suspended' state that we apply
> when the genpd is powered down from runtime PM suspend of all devices in
> the domain. This would then be overwritten with whatever the aggregated
> performance state is when the genpd is powered back on due to some
> device runtime resuming.

>From the above, I assume "target" to not always be zero. That confirms
my understanding from earlier discussions.

That is sort of what we already have today. The consumer driver are
free to change the performance state, via
dev_pm_genpd_set_performance_state() (or indirectly via
dev_pm_opp_set_rate()), for its device at any time. For example, it
may want to set a new performance state at runtime suspend.

> Don't we already sort of have support for this
> right now in genpd with commit fc5cbf0c94b6 ("PM / Domains: Support for
> multiple states")? So I would think that we either let that state_idx
> member be 0 or some implementation defined 'off' state index of the
> genpd that can be achieved.

Well, implementation details, but thanks for sharing your ideas.

> I was also thinking that the genpd_power_state was more than just a
> bunch of idle states of a device so that we could combine
> on/off/idle/active(performance states?) into one genpd_power_state
> space that is affected by idle and on/off runtime PM status.
>
> In the Qualcomm use-case, the state of the clk, either on or off, is
> factored into the decision to consider the voltage constraint that the
> clk has on the CX domain. So when the clk is disabled, the voltage
> constraint on CX can be removed/ignored in the aggregation equation
> because the hardware isn't clocking. This needs to work properly so that
> devices can disable their clks and save power.
>
> I hope that we can move clk on/off/rate control to be implemented with
> clk domains based upon genpds so that we can generically reason about
> genpds being on/off and at some performance state (i.e. a frequency)
> instead of making something clk specific in the genpd code. If we can do
> that, then this can be stated as a slave genpd (the clk domain) that's
> linked to a master genpd (the voltage/corner domain) can only affect the
> performance state requirement of the master genpd when the slave genpd
> is on. When the slave genpd is off, the performance state requirement is
> dropped and the master domain settles to a lower requirement based on
> the other domains linked to it that are on. The clk domain would turn on
> and off when the set of devices it is attached to are all suspended and
> that would "do the right thing" 

Re: [PATCH] power: supply: cpcap-battery: make array cpcap_battery_irqs static, shrinks object size

2018-12-05 Thread Sebastian Reichel
Hi,

On Thu, Nov 29, 2018 at 11:03:42PM +, Colin King wrote:
> From: Colin Ian King 
> 
> Don't populate the array cpcap_battery_irqs  on the stack but instead
> make it static. Makes the object code smaller by 99 bytes:
> 
> Before:
>text  data bss dec hex filename
>   13673  2448   0   161213ef9 cpcap-battery.o
> 
> After:
>text  data bss dec hex filename
>   13510  2512   0   160223e96 cpcap-battery.o
> 
> (gcc version 8.2.0 x86_64)
> 
> Signed-off-by: Colin Ian King 
> ---

Thanks, queued.

-- Sebastian

>  drivers/power/supply/cpcap-battery.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/power/supply/cpcap-battery.c 
> b/drivers/power/supply/cpcap-battery.c
> index 98ba07869c3b..204e22e3e9b4 100644
> --- a/drivers/power/supply/cpcap-battery.c
> +++ b/drivers/power/supply/cpcap-battery.c
> @@ -620,7 +620,7 @@ static int cpcap_battery_init_irq(struct platform_device 
> *pdev,
>  static int cpcap_battery_init_interrupts(struct platform_device *pdev,
>struct cpcap_battery_ddata *ddata)
>  {
> - const char * const cpcap_battery_irqs[] = {
> + static const char * const cpcap_battery_irqs[] = {
>   "eol", "lowbph", "lowbpl",
>   "chrgcurr1", "battdetb"
>   };
> -- 
> 2.19.1
> 


signature.asc
Description: PGP signature


Re: [PATCH] power: supply: cpcap-battery: make array cpcap_battery_irqs static, shrinks object size

2018-12-05 Thread Sebastian Reichel
Hi,

On Thu, Nov 29, 2018 at 11:03:42PM +, Colin King wrote:
> From: Colin Ian King 
> 
> Don't populate the array cpcap_battery_irqs  on the stack but instead
> make it static. Makes the object code smaller by 99 bytes:
> 
> Before:
>text  data bss dec hex filename
>   13673  2448   0   161213ef9 cpcap-battery.o
> 
> After:
>text  data bss dec hex filename
>   13510  2512   0   160223e96 cpcap-battery.o
> 
> (gcc version 8.2.0 x86_64)
> 
> Signed-off-by: Colin Ian King 
> ---

Thanks, queued.

-- Sebastian

>  drivers/power/supply/cpcap-battery.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/power/supply/cpcap-battery.c 
> b/drivers/power/supply/cpcap-battery.c
> index 98ba07869c3b..204e22e3e9b4 100644
> --- a/drivers/power/supply/cpcap-battery.c
> +++ b/drivers/power/supply/cpcap-battery.c
> @@ -620,7 +620,7 @@ static int cpcap_battery_init_irq(struct platform_device 
> *pdev,
>  static int cpcap_battery_init_interrupts(struct platform_device *pdev,
>struct cpcap_battery_ddata *ddata)
>  {
> - const char * const cpcap_battery_irqs[] = {
> + static const char * const cpcap_battery_irqs[] = {
>   "eol", "lowbph", "lowbpl",
>   "chrgcurr1", "battdetb"
>   };
> -- 
> 2.19.1
> 


signature.asc
Description: PGP signature


[PATCH] ARM: dts: omap5: Fix dual-role mode on Super-Speed port

2018-12-05 Thread Roger Quadros
OMAP5's Super-Speed USB port has a software mailbox register
that needs to be fed with VBUS and ID events from an external
VBUS/ID comparator.

Without this, Host role will not work correctly.

Fixes: 656c1a65ab55 ("ARM: dts: omap5: enable OTG role for DWC3 controller")
Reported-by: H. Nikolaus Schaller 
Signed-off-by: Roger Quadros 
---

Tony,

Please included this in v4.20-rc cycle. Thanks.

cheers,
-roger

 arch/arm/boot/dts/omap5-board-common.dtsi | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm/boot/dts/omap5-board-common.dtsi 
b/arch/arm/boot/dts/omap5-board-common.dtsi
index bf7ca00..bc853eb 100644
--- a/arch/arm/boot/dts/omap5-board-common.dtsi
+++ b/arch/arm/boot/dts/omap5-board-common.dtsi
@@ -701,6 +701,7 @@
 };
 
  {
+   extcon = <_usb3>;
dr_mode = "otg";
 };
 
-- 
Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki.
Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki



[PATCH] ARM: dts: omap5: Fix dual-role mode on Super-Speed port

2018-12-05 Thread Roger Quadros
OMAP5's Super-Speed USB port has a software mailbox register
that needs to be fed with VBUS and ID events from an external
VBUS/ID comparator.

Without this, Host role will not work correctly.

Fixes: 656c1a65ab55 ("ARM: dts: omap5: enable OTG role for DWC3 controller")
Reported-by: H. Nikolaus Schaller 
Signed-off-by: Roger Quadros 
---

Tony,

Please included this in v4.20-rc cycle. Thanks.

cheers,
-roger

 arch/arm/boot/dts/omap5-board-common.dtsi | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm/boot/dts/omap5-board-common.dtsi 
b/arch/arm/boot/dts/omap5-board-common.dtsi
index bf7ca00..bc853eb 100644
--- a/arch/arm/boot/dts/omap5-board-common.dtsi
+++ b/arch/arm/boot/dts/omap5-board-common.dtsi
@@ -701,6 +701,7 @@
 };
 
  {
+   extcon = <_usb3>;
dr_mode = "otg";
 };
 
-- 
Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki.
Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki



Re: [RFC PATCH 00/14] Heterogeneous Memory System (HMS) and hbind()

2018-12-05 Thread Dave Hansen
On 12/4/18 6:13 PM, Jerome Glisse wrote:
> On Tue, Dec 04, 2018 at 05:06:49PM -0800, Dave Hansen wrote:
>> OK, but there are 1024*1024 matrix cells on a systems with 1024
>> proximity domains (ACPI term for NUMA node).  So it sounds like you are
>> proposing a million-directory approach.
> 
> No, pseudo code:
> struct list links;
> 
> for (unsigned r = 0; r < nrows; r++) {
> for (unsigned c = 0; c < ncolumns; c++) {
> if (!link_find(links, hmat[r][c].bandwidth,
>hmat[r][c].latency)) {
> link = link_new(hmat[r][c].bandwidth,
> hmat[r][c].latency);
> // add initiator and target correspond to that row
> // and columns to this new link
> list_add(, links);
> }
> }
> }
> 
> So all cells that have same property are under the same link. 

OK, so the "link" here is like a cable.  It's like saying, "we have a
network and everything is connected with an ethernet cable that can do
1gbit/sec".

But, what actually connects an initiator to a target?  I assume we still
need to know which link is used for each target/initiator pair.  Where
is that enumerated?

I think this just means we need a million symlinks to a "link" instead
of a million link directories.  Still not great.

> Note that userspace can parse all this once during its initialization
> and create pools of target to use.

It sounds like you're agreeing that there is too much data in this
interface for applications to _regularly_ parse it.  We need some
central thing that parses it all and caches the results.


Re: [RFC PATCH 00/14] Heterogeneous Memory System (HMS) and hbind()

2018-12-05 Thread Dave Hansen
On 12/4/18 6:13 PM, Jerome Glisse wrote:
> On Tue, Dec 04, 2018 at 05:06:49PM -0800, Dave Hansen wrote:
>> OK, but there are 1024*1024 matrix cells on a systems with 1024
>> proximity domains (ACPI term for NUMA node).  So it sounds like you are
>> proposing a million-directory approach.
> 
> No, pseudo code:
> struct list links;
> 
> for (unsigned r = 0; r < nrows; r++) {
> for (unsigned c = 0; c < ncolumns; c++) {
> if (!link_find(links, hmat[r][c].bandwidth,
>hmat[r][c].latency)) {
> link = link_new(hmat[r][c].bandwidth,
> hmat[r][c].latency);
> // add initiator and target correspond to that row
> // and columns to this new link
> list_add(, links);
> }
> }
> }
> 
> So all cells that have same property are under the same link. 

OK, so the "link" here is like a cable.  It's like saying, "we have a
network and everything is connected with an ethernet cable that can do
1gbit/sec".

But, what actually connects an initiator to a target?  I assume we still
need to know which link is used for each target/initiator pair.  Where
is that enumerated?

I think this just means we need a million symlinks to a "link" instead
of a million link directories.  Still not great.

> Note that userspace can parse all this once during its initialization
> and create pools of target to use.

It sounds like you're agreeing that there is too much data in this
interface for applications to _regularly_ parse it.  We need some
central thing that parses it all and caches the results.


<    5   6   7   8   9   10   11   12   13   14   >