Re: [PATCH 2/2] ARM: Wrap '--pic-veneer' with ld-option
On Wed, Dec 5, 2018 at 12:10 AM Ard Biesheuvel wrote: > > (+ Arnd) > > On Wed, 5 Dec 2018 at 09:06, Nathan Chancellor > wrote: > > > > On Wed, Dec 05, 2018 at 08:37:05AM +0100, Ard Biesheuvel wrote: > > > On Wed, 5 Dec 2018 at 02:42, Nathan Chancellor > > > wrote: > > > > > > > > This flag is not supported by lld: > > > > > > > > ld.lld: error: unknown argument: --pic-veneer > > > > > > > > Signed-off-by: Nathan Chancellor > > > > > > Hi Nate, > > > > > > Does this mean ld.lld is guaranteed to produce position independent > > > veneers if you build kernels that are bigger than the typical range of > > > a relative branch? (+ Peter) who might be able to answer this. > > > > > > > Hi Ard, > > > > Honestly, I'm not quite sure. I saw your commit that introduced this > > flag and I wasn't quite sure what to make of it for lld. What > > configuration would I use to verify and what would I check for? > > > > Try building allyesconfig, and check the resulting binary for veneers > (which have 'veneer' in the symbol name, at least when ld.bfd emits > them). These veneers should not take the [virtual] address of the > branch target directly, but take a PC relative offset (as in the > example in the commit log of that patch you are referring to) Not sure clang can compile an arm32 allyesconfig (haven't tried personally, yet, TODO), but we could try gcc+lld. > > > Additionally, I have filed an LLVM bug for the lld developers to > > check and see if this is a flag they should support: > > > > https://bugs.llvm.org/show_bug.cgi?id=39886 > > > > Thanks for the quick reply, > > Nathan > > > > > > --- > > > > arch/arm/Makefile | 2 +- > > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > > > diff --git a/arch/arm/Makefile b/arch/arm/Makefile > > > > index e2a0baf36766..4fab2aa29570 100644 > > > > --- a/arch/arm/Makefile > > > > +++ b/arch/arm/Makefile > > > > @@ -10,7 +10,7 @@ > > > > # > > > > # Copyright (C) 1995-2001 by Russell King > > > > > > > > -LDFLAGS_vmlinux:= --no-undefined -X --pic-veneer > > > > +LDFLAGS_vmlinux:= --no-undefined -X $(call > > > > ld-option,--pic-veneer) > > > > ifeq ($(CONFIG_CPU_ENDIAN_BE8),y) > > > > LDFLAGS_vmlinux+= --be8 > > > > KBUILD_LDFLAGS_MODULE += --be8 > > > > -- > > > > 2.20.0.rc1 > > > > -- Thanks, ~Nick Desaulniers
Re: [PATCH 2/2] ARM: Wrap '--pic-veneer' with ld-option
On Wed, Dec 5, 2018 at 12:10 AM Ard Biesheuvel wrote: > > (+ Arnd) > > On Wed, 5 Dec 2018 at 09:06, Nathan Chancellor > wrote: > > > > On Wed, Dec 05, 2018 at 08:37:05AM +0100, Ard Biesheuvel wrote: > > > On Wed, 5 Dec 2018 at 02:42, Nathan Chancellor > > > wrote: > > > > > > > > This flag is not supported by lld: > > > > > > > > ld.lld: error: unknown argument: --pic-veneer > > > > > > > > Signed-off-by: Nathan Chancellor > > > > > > Hi Nate, > > > > > > Does this mean ld.lld is guaranteed to produce position independent > > > veneers if you build kernels that are bigger than the typical range of > > > a relative branch? (+ Peter) who might be able to answer this. > > > > > > > Hi Ard, > > > > Honestly, I'm not quite sure. I saw your commit that introduced this > > flag and I wasn't quite sure what to make of it for lld. What > > configuration would I use to verify and what would I check for? > > > > Try building allyesconfig, and check the resulting binary for veneers > (which have 'veneer' in the symbol name, at least when ld.bfd emits > them). These veneers should not take the [virtual] address of the > branch target directly, but take a PC relative offset (as in the > example in the commit log of that patch you are referring to) Not sure clang can compile an arm32 allyesconfig (haven't tried personally, yet, TODO), but we could try gcc+lld. > > > Additionally, I have filed an LLVM bug for the lld developers to > > check and see if this is a flag they should support: > > > > https://bugs.llvm.org/show_bug.cgi?id=39886 > > > > Thanks for the quick reply, > > Nathan > > > > > > --- > > > > arch/arm/Makefile | 2 +- > > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > > > diff --git a/arch/arm/Makefile b/arch/arm/Makefile > > > > index e2a0baf36766..4fab2aa29570 100644 > > > > --- a/arch/arm/Makefile > > > > +++ b/arch/arm/Makefile > > > > @@ -10,7 +10,7 @@ > > > > # > > > > # Copyright (C) 1995-2001 by Russell King > > > > > > > > -LDFLAGS_vmlinux:= --no-undefined -X --pic-veneer > > > > +LDFLAGS_vmlinux:= --no-undefined -X $(call > > > > ld-option,--pic-veneer) > > > > ifeq ($(CONFIG_CPU_ENDIAN_BE8),y) > > > > LDFLAGS_vmlinux+= --be8 > > > > KBUILD_LDFLAGS_MODULE += --be8 > > > > -- > > > > 2.20.0.rc1 > > > > -- Thanks, ~Nick Desaulniers
Re: [PATCH v3] signal: add procfd_send_signal() syscall
Christian Brauner writes: > The kill() syscall operates on process identifiers (pid). After a process > has exited its pid can be reused by another process. If a caller sends a > signal to a reused pid it will end up signaling the wrong process. This > issue has often surfaced and there has been a push [1] to address this > problem. > > This patch uses file descriptors (fd) from proc/ as stable handles on > struct pid. Even if a pid is recycled the handle will not change. The fd > can be used to send signals to the process it refers to. > Thus, the new syscall procfd_send_signal() is introduced to solve this > problem. Instead of pids it operates on process fds (procfd). > > /* prototype and argument /* > long procfd_send_signal(int fd, int sig, siginfo_t *info, unsigned int flags); > > In addition to the procfd and signal argument it takes an additional > siginfo_t and flags argument. If the siginfo_t argument is NULL then > procfd_send_signal() behaves like kill(). If it is not NULL > procfd_send_signal() behaves like rt_sigqueueinfo(). > The flags argument is added to allow for future extensions of this syscall. > It currently needs to be passed as 0. Failing to do so will cause EINVAL. > > /* procfd_send_signal() replaces multiple pid-based syscalls */ > The procfd_send_signal() syscall currently takes on the job of the > following syscalls that operate on pids: > - kill(2) > - rt_sigqueueinfo(2) > The syscall is defined in such a way that it can also operate on thread fds > instead of process fds. In a future patchset I will extend it to operate on > procfds from /proc//task/ at which point it will additionally > take on the job of: > - tgkill(2) > - rt_tgsigqueueinfo(2) > Right now this is intentionally left out to keep this patchset as simple as > possible (cf. [4]). If a procfd of /proc//task/ is passed > EOPNOTSUPP will be returned to give userspace a way to detect when I add > support for such procfds {cf. [10]). > > /* naming */ > The name procfd_send_signal() was chosen to reflect the fact that it takes > on the job of multiple syscalls. It is intentional that the name is not > reminiscent of neither kill(2) nor rt_sigqueueinfo(2). Not the fomer > because it might imply that procfd_send_signal() is only a replacement for > kill(2) and not the latter because it is a hazzle to remember the correct > spelling (especially for non-native speakers) and because it is not > descriptive enough of what the syscall actually does. The name > "procfd_send_signal" makes it very clear that its job is to send signals. > > /* O_PATH file descriptors */ > procfds opened as O_PATH fds cannot be used to send signals to a process > (cf. [2]). Signaling processes through procfds is the equivalent of writing > to a file. Thus, this is not an operation that operates "purely at the > file descriptor level" as required by the open(2) manpage. > > /* zombies */ > Zombies can be signaled just as any other process. No special error will be > reported since a zombie state is an unreliable state (cf. [3]). > > /* cross-namespace signals */ > The patch currently enforces that the signaler and signalee either are in > the same pid namespace or that the signaler's pid namespace is an ancestor > of the signalee's pid namespace. This is done for the sake of simplicity > and because it is unclear to what values certain members of struct > siginfo_t would need to be set to (cf. [5], [6]). > > /* compat syscalls */ > It became clear that we would like to avoid adding compat syscalls (cf. > [7]). The compat syscall handling is now done in kernel/signal.c itself by > adding __copy_siginfo_from_user_generic() which lets us avoid compat > syscalls (cf. [8]). > With upcoming rework for syscall handling things might improve even further > (cf. [11]). > This patch was tested on x64 and x86. > > /* userspace usage */ > An asciinema recording for the basic functionality can be found under [9]. > With this patch a process can be killed via: > > #define _GNU_SOURCE > #include > #include > #include > #include > #include > #include > #include > #include > #include > #include > > static inline int do_procfd_send_signal(int procfd, int sig, siginfo_t *info, > unsigned int flags) > { > #ifdef __NR_procfd_send_signal > return syscall(__NR_procfd_send_signal, procfd, sig, info, flags); > #else > return -ENOSYS; > #endif > } > > int main(int argc, char *argv[]) > { > int fd, ret, saved_errno, sig; > > if (argc < 3) > exit(EXIT_FAILURE); > > fd = open(argv[1], O_DIRECTORY | O_CLOEXEC); > if (fd < 0) { > printf("%s - Failed to open \"%s\"\n", strerror(errno), > argv[1]); > exit(EXIT_FAILURE); > } > > sig = atoi(argv[2]); > > printf("Sending signal %d to process %s\n", sig, argv[1]); > ret = do_procfd_send_signal(fd, sig, NULL, 0); > >
Re: [PATCH v3] signal: add procfd_send_signal() syscall
Christian Brauner writes: > The kill() syscall operates on process identifiers (pid). After a process > has exited its pid can be reused by another process. If a caller sends a > signal to a reused pid it will end up signaling the wrong process. This > issue has often surfaced and there has been a push [1] to address this > problem. > > This patch uses file descriptors (fd) from proc/ as stable handles on > struct pid. Even if a pid is recycled the handle will not change. The fd > can be used to send signals to the process it refers to. > Thus, the new syscall procfd_send_signal() is introduced to solve this > problem. Instead of pids it operates on process fds (procfd). > > /* prototype and argument /* > long procfd_send_signal(int fd, int sig, siginfo_t *info, unsigned int flags); > > In addition to the procfd and signal argument it takes an additional > siginfo_t and flags argument. If the siginfo_t argument is NULL then > procfd_send_signal() behaves like kill(). If it is not NULL > procfd_send_signal() behaves like rt_sigqueueinfo(). > The flags argument is added to allow for future extensions of this syscall. > It currently needs to be passed as 0. Failing to do so will cause EINVAL. > > /* procfd_send_signal() replaces multiple pid-based syscalls */ > The procfd_send_signal() syscall currently takes on the job of the > following syscalls that operate on pids: > - kill(2) > - rt_sigqueueinfo(2) > The syscall is defined in such a way that it can also operate on thread fds > instead of process fds. In a future patchset I will extend it to operate on > procfds from /proc//task/ at which point it will additionally > take on the job of: > - tgkill(2) > - rt_tgsigqueueinfo(2) > Right now this is intentionally left out to keep this patchset as simple as > possible (cf. [4]). If a procfd of /proc//task/ is passed > EOPNOTSUPP will be returned to give userspace a way to detect when I add > support for such procfds {cf. [10]). > > /* naming */ > The name procfd_send_signal() was chosen to reflect the fact that it takes > on the job of multiple syscalls. It is intentional that the name is not > reminiscent of neither kill(2) nor rt_sigqueueinfo(2). Not the fomer > because it might imply that procfd_send_signal() is only a replacement for > kill(2) and not the latter because it is a hazzle to remember the correct > spelling (especially for non-native speakers) and because it is not > descriptive enough of what the syscall actually does. The name > "procfd_send_signal" makes it very clear that its job is to send signals. > > /* O_PATH file descriptors */ > procfds opened as O_PATH fds cannot be used to send signals to a process > (cf. [2]). Signaling processes through procfds is the equivalent of writing > to a file. Thus, this is not an operation that operates "purely at the > file descriptor level" as required by the open(2) manpage. > > /* zombies */ > Zombies can be signaled just as any other process. No special error will be > reported since a zombie state is an unreliable state (cf. [3]). > > /* cross-namespace signals */ > The patch currently enforces that the signaler and signalee either are in > the same pid namespace or that the signaler's pid namespace is an ancestor > of the signalee's pid namespace. This is done for the sake of simplicity > and because it is unclear to what values certain members of struct > siginfo_t would need to be set to (cf. [5], [6]). > > /* compat syscalls */ > It became clear that we would like to avoid adding compat syscalls (cf. > [7]). The compat syscall handling is now done in kernel/signal.c itself by > adding __copy_siginfo_from_user_generic() which lets us avoid compat > syscalls (cf. [8]). > With upcoming rework for syscall handling things might improve even further > (cf. [11]). > This patch was tested on x64 and x86. > > /* userspace usage */ > An asciinema recording for the basic functionality can be found under [9]. > With this patch a process can be killed via: > > #define _GNU_SOURCE > #include > #include > #include > #include > #include > #include > #include > #include > #include > #include > > static inline int do_procfd_send_signal(int procfd, int sig, siginfo_t *info, > unsigned int flags) > { > #ifdef __NR_procfd_send_signal > return syscall(__NR_procfd_send_signal, procfd, sig, info, flags); > #else > return -ENOSYS; > #endif > } > > int main(int argc, char *argv[]) > { > int fd, ret, saved_errno, sig; > > if (argc < 3) > exit(EXIT_FAILURE); > > fd = open(argv[1], O_DIRECTORY | O_CLOEXEC); > if (fd < 0) { > printf("%s - Failed to open \"%s\"\n", strerror(errno), > argv[1]); > exit(EXIT_FAILURE); > } > > sig = atoi(argv[2]); > > printf("Sending signal %d to process %s\n", sig, argv[1]); > ret = do_procfd_send_signal(fd, sig, NULL, 0); > >
Re: [GIT PULL] MFD fixes for v4.20
The pull request you sent on Wed, 5 Dec 2018 10:08:56 +: > git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd.git mfd-fixes-4.20 has been merged into torvalds/linux.git: https://git.kernel.org/torvalds/c/5e0dc1a7ce1f322c06e993845def2c73cfc485fd Thank you! -- Deet-doot-dot, I am a bot. https://korg.wiki.kernel.org/userdoc/prtracker
Re: [GIT PULL] MFD fixes for v4.20
The pull request you sent on Wed, 5 Dec 2018 10:08:56 +: > git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd.git mfd-fixes-4.20 has been merged into torvalds/linux.git: https://git.kernel.org/torvalds/c/5e0dc1a7ce1f322c06e993845def2c73cfc485fd Thank you! -- Deet-doot-dot, I am a bot. https://korg.wiki.kernel.org/userdoc/prtracker
Re: [GIT PULL] parisc architecture fix for kernel v4.20-rc6
The pull request you sent on Tue, 4 Dec 2018 16:00:32 +0100: > git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux.git > parisc-4.20-4 has been merged into torvalds/linux.git: https://git.kernel.org/torvalds/c/a517af52cb57f972cee2378765e1c5dd10141f0d Thank you! -- Deet-doot-dot, I am a bot. https://korg.wiki.kernel.org/userdoc/prtracker
Re: [RFC PATCH 02/14] mm/hms: heterogenenous memory system (HMS) documentation
On 2018-12-05 11:07 a.m., Jerome Glisse wrote: >> Well multiple links are easy when you have a 'link' bus. Just add >> another link device under the bus. > > So you are telling do what i am doing in this patchset but not under > HMS directory ? No, it's completely different. I'm talking about creating a bus to describe only the real hardware that links GPUs. Not creating a new virtual tree containing a bunch of duplicate bus and device information that already exists currently in sysfs. >> >> Technically, the accessibility issue is already encoded in sysfs. For >> example, through the PCI tree you can determine which ACS bits are set >> and determine which devices are behind the same root bridge the same way >> we do in the kernel p2pdma subsystem. This is all bus specific which is >> fine, but if we want to change that, we should have a common way for >> existing buses to describe these attributes in the existing tree. The >> new 'link' bus devices would have to have some way to describe cases if >> memory isn't accessible in some way across it. > > What i am looking at is much more complex than just access bit. It > is a whole set of properties attach to each path (can it be cache > coherent ? can it do atomic ? what is the access granularity ? what > is the bandwidth ? is it dedicated link ? ...) I'm not talking about just an access bit. I'm talking about what you are describing: standard ways for *existing* buses in the sysfs hierarchy to describe things like cache coherency, atomics, granularity, etc without creating a new hierarchy. > On top of that i argue that more people would use that information if it > were available to them. I agree that i have no hard evidence to back that > up and that it is just a feeling but you can not disprove me either as > this is a chicken and egg problem, you can not prove people will not use > an API if the API is not there to be use. And you miss my point that much of this information is already available to them. And more can be added in the existing framework without creating any brand new concepts. I haven't said anything about chicken-and-egg problems -- I've given you a bunch of different suggestions to split this up into more managable problems and address many of them within the APIs and frameworks we have already. Logan
Re: [RFC PATCH 02/14] mm/hms: heterogenenous memory system (HMS) documentation
On 2018-12-05 11:07 a.m., Jerome Glisse wrote: >> Well multiple links are easy when you have a 'link' bus. Just add >> another link device under the bus. > > So you are telling do what i am doing in this patchset but not under > HMS directory ? No, it's completely different. I'm talking about creating a bus to describe only the real hardware that links GPUs. Not creating a new virtual tree containing a bunch of duplicate bus and device information that already exists currently in sysfs. >> >> Technically, the accessibility issue is already encoded in sysfs. For >> example, through the PCI tree you can determine which ACS bits are set >> and determine which devices are behind the same root bridge the same way >> we do in the kernel p2pdma subsystem. This is all bus specific which is >> fine, but if we want to change that, we should have a common way for >> existing buses to describe these attributes in the existing tree. The >> new 'link' bus devices would have to have some way to describe cases if >> memory isn't accessible in some way across it. > > What i am looking at is much more complex than just access bit. It > is a whole set of properties attach to each path (can it be cache > coherent ? can it do atomic ? what is the access granularity ? what > is the bandwidth ? is it dedicated link ? ...) I'm not talking about just an access bit. I'm talking about what you are describing: standard ways for *existing* buses in the sysfs hierarchy to describe things like cache coherency, atomics, granularity, etc without creating a new hierarchy. > On top of that i argue that more people would use that information if it > were available to them. I agree that i have no hard evidence to back that > up and that it is just a feeling but you can not disprove me either as > this is a chicken and egg problem, you can not prove people will not use > an API if the API is not there to be use. And you miss my point that much of this information is already available to them. And more can be added in the existing framework without creating any brand new concepts. I haven't said anything about chicken-and-egg problems -- I've given you a bunch of different suggestions to split this up into more managable problems and address many of them within the APIs and frameworks we have already. Logan
Re: [GIT PULL] parisc architecture fix for kernel v4.20-rc6
The pull request you sent on Tue, 4 Dec 2018 16:00:32 +0100: > git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux.git > parisc-4.20-4 has been merged into torvalds/linux.git: https://git.kernel.org/torvalds/c/a517af52cb57f972cee2378765e1c5dd10141f0d Thank you! -- Deet-doot-dot, I am a bot. https://korg.wiki.kernel.org/userdoc/prtracker
Re: [GIT PULL] Power management fix for v4.20-rc6
The pull request you sent on Wed, 5 Dec 2018 12:53:50 +0100: > git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git pm-4.20-rc6 has been merged into torvalds/linux.git: https://git.kernel.org/torvalds/c/91dd51f6e7ce7f1eba5d7b583790e134367d0fbb Thank you! -- Deet-doot-dot, I am a bot. https://korg.wiki.kernel.org/userdoc/prtracker
Re: [GIT PULL] Power management fix for v4.20-rc6
The pull request you sent on Wed, 5 Dec 2018 12:53:50 +0100: > git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git pm-4.20-rc6 has been merged into torvalds/linux.git: https://git.kernel.org/torvalds/c/91dd51f6e7ce7f1eba5d7b583790e134367d0fbb Thank you! -- Deet-doot-dot, I am a bot. https://korg.wiki.kernel.org/userdoc/prtracker
Re: [git pull] Input updates for v4.20-rc5
The pull request you sent on Mon, 3 Dec 2018 19:37:46 -0800: > git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input.git for-linus has been merged into torvalds/linux.git: https://git.kernel.org/torvalds/c/adac0753c25217a2365b132c87cb2540b51fa89b Thank you! -- Deet-doot-dot, I am a bot. https://korg.wiki.kernel.org/userdoc/prtracker
Re: [git pull] Input updates for v4.20-rc5
The pull request you sent on Mon, 3 Dec 2018 19:37:46 -0800: > git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input.git for-linus has been merged into torvalds/linux.git: https://git.kernel.org/torvalds/c/adac0753c25217a2365b132c87cb2540b51fa89b Thank you! -- Deet-doot-dot, I am a bot. https://korg.wiki.kernel.org/userdoc/prtracker
Re: [PATCH v3] signal: add procfd_send_signal() syscall
Christian Brauner writes: > The kill() syscall operates on process identifiers (pid). After a process > has exited its pid can be reused by another process. If a caller sends a > signal to a reused pid it will end up signaling the wrong process. This > issue has often surfaced and there has been a push [1] to address this > problem. > > This patch uses file descriptors (fd) from proc/ as stable handles on > struct pid. Even if a pid is recycled the handle will not change. The fd > can be used to send signals to the process it refers to. > Thus, the new syscall procfd_send_signal() is introduced to solve this > problem. Instead of pids it operates on process fds (procfd). > > /* prototype and argument /* > long procfd_send_signal(int fd, int sig, siginfo_t *info, unsigned int flags); > > In addition to the procfd and signal argument it takes an additional > siginfo_t and flags argument. If the siginfo_t argument is NULL then > procfd_send_signal() behaves like kill(). If it is not NULL > procfd_send_signal() behaves like rt_sigqueueinfo(). > The flags argument is added to allow for future extensions of this syscall. > It currently needs to be passed as 0. Failing to do so will cause EINVAL. > > /* procfd_send_signal() replaces multiple pid-based syscalls */ > The procfd_send_signal() syscall currently takes on the job of the > following syscalls that operate on pids: > - kill(2) > - rt_sigqueueinfo(2) > The syscall is defined in such a way that it can also operate on thread fds > instead of process fds. In a future patchset I will extend it to operate on > procfds from /proc//task/ at which point it will additionally > take on the job of: > - tgkill(2) > - rt_tgsigqueueinfo(2) > Right now this is intentionally left out to keep this patchset as simple as > possible (cf. [4]). If a procfd of /proc//task/ is passed > EOPNOTSUPP will be returned to give userspace a way to detect when I add > support for such procfds {cf. [10]). > > /* naming */ > The name procfd_send_signal() was chosen to reflect the fact that it takes > on the job of multiple syscalls. It is intentional that the name is not > reminiscent of neither kill(2) nor rt_sigqueueinfo(2). Not the fomer > because it might imply that procfd_send_signal() is only a replacement for > kill(2) and not the latter because it is a hazzle to remember the correct > spelling (especially for non-native speakers) and because it is not > descriptive enough of what the syscall actually does. The name > "procfd_send_signal" makes it very clear that its job is to send signals. > > /* O_PATH file descriptors */ > procfds opened as O_PATH fds cannot be used to send signals to a process > (cf. [2]). Signaling processes through procfds is the equivalent of writing > to a file. Thus, this is not an operation that operates "purely at the > file descriptor level" as required by the open(2) manpage. > > /* zombies */ > Zombies can be signaled just as any other process. No special error will be > reported since a zombie state is an unreliable state (cf. [3]). > > /* cross-namespace signals */ > The patch currently enforces that the signaler and signalee either are in > the same pid namespace or that the signaler's pid namespace is an ancestor > of the signalee's pid namespace. This is done for the sake of simplicity > and because it is unclear to what values certain members of struct > siginfo_t would need to be set to (cf. [5], [6]). > > /* compat syscalls */ > It became clear that we would like to avoid adding compat syscalls (cf. > [7]). The compat syscall handling is now done in kernel/signal.c itself by > adding __copy_siginfo_from_user_generic() which lets us avoid compat > syscalls (cf. [8]). > With upcoming rework for syscall handling things might improve even further > (cf. [11]). > This patch was tested on x64 and x86. > > /* userspace usage */ > An asciinema recording for the basic functionality can be found under [9]. > With this patch a process can be killed via: > > #define _GNU_SOURCE > #include > #include > #include > #include > #include > #include > #include > #include > #include > #include > > static inline int do_procfd_send_signal(int procfd, int sig, siginfo_t *info, > unsigned int flags) > { > #ifdef __NR_procfd_send_signal > return syscall(__NR_procfd_send_signal, procfd, sig, info, flags); > #else > return -ENOSYS; > #endif > } > > int main(int argc, char *argv[]) > { > int fd, ret, saved_errno, sig; > > if (argc < 3) > exit(EXIT_FAILURE); > > fd = open(argv[1], O_DIRECTORY | O_CLOEXEC); > if (fd < 0) { > printf("%s - Failed to open \"%s\"\n", strerror(errno), > argv[1]); > exit(EXIT_FAILURE); > } > > sig = atoi(argv[2]); > > printf("Sending signal %d to process %s\n", sig, argv[1]); > ret = do_procfd_send_signal(fd, sig, NULL, 0); > >
Re: [PATCH v3] signal: add procfd_send_signal() syscall
Christian Brauner writes: > The kill() syscall operates on process identifiers (pid). After a process > has exited its pid can be reused by another process. If a caller sends a > signal to a reused pid it will end up signaling the wrong process. This > issue has often surfaced and there has been a push [1] to address this > problem. > > This patch uses file descriptors (fd) from proc/ as stable handles on > struct pid. Even if a pid is recycled the handle will not change. The fd > can be used to send signals to the process it refers to. > Thus, the new syscall procfd_send_signal() is introduced to solve this > problem. Instead of pids it operates on process fds (procfd). > > /* prototype and argument /* > long procfd_send_signal(int fd, int sig, siginfo_t *info, unsigned int flags); > > In addition to the procfd and signal argument it takes an additional > siginfo_t and flags argument. If the siginfo_t argument is NULL then > procfd_send_signal() behaves like kill(). If it is not NULL > procfd_send_signal() behaves like rt_sigqueueinfo(). > The flags argument is added to allow for future extensions of this syscall. > It currently needs to be passed as 0. Failing to do so will cause EINVAL. > > /* procfd_send_signal() replaces multiple pid-based syscalls */ > The procfd_send_signal() syscall currently takes on the job of the > following syscalls that operate on pids: > - kill(2) > - rt_sigqueueinfo(2) > The syscall is defined in such a way that it can also operate on thread fds > instead of process fds. In a future patchset I will extend it to operate on > procfds from /proc//task/ at which point it will additionally > take on the job of: > - tgkill(2) > - rt_tgsigqueueinfo(2) > Right now this is intentionally left out to keep this patchset as simple as > possible (cf. [4]). If a procfd of /proc//task/ is passed > EOPNOTSUPP will be returned to give userspace a way to detect when I add > support for such procfds {cf. [10]). > > /* naming */ > The name procfd_send_signal() was chosen to reflect the fact that it takes > on the job of multiple syscalls. It is intentional that the name is not > reminiscent of neither kill(2) nor rt_sigqueueinfo(2). Not the fomer > because it might imply that procfd_send_signal() is only a replacement for > kill(2) and not the latter because it is a hazzle to remember the correct > spelling (especially for non-native speakers) and because it is not > descriptive enough of what the syscall actually does. The name > "procfd_send_signal" makes it very clear that its job is to send signals. > > /* O_PATH file descriptors */ > procfds opened as O_PATH fds cannot be used to send signals to a process > (cf. [2]). Signaling processes through procfds is the equivalent of writing > to a file. Thus, this is not an operation that operates "purely at the > file descriptor level" as required by the open(2) manpage. > > /* zombies */ > Zombies can be signaled just as any other process. No special error will be > reported since a zombie state is an unreliable state (cf. [3]). > > /* cross-namespace signals */ > The patch currently enforces that the signaler and signalee either are in > the same pid namespace or that the signaler's pid namespace is an ancestor > of the signalee's pid namespace. This is done for the sake of simplicity > and because it is unclear to what values certain members of struct > siginfo_t would need to be set to (cf. [5], [6]). > > /* compat syscalls */ > It became clear that we would like to avoid adding compat syscalls (cf. > [7]). The compat syscall handling is now done in kernel/signal.c itself by > adding __copy_siginfo_from_user_generic() which lets us avoid compat > syscalls (cf. [8]). > With upcoming rework for syscall handling things might improve even further > (cf. [11]). > This patch was tested on x64 and x86. > > /* userspace usage */ > An asciinema recording for the basic functionality can be found under [9]. > With this patch a process can be killed via: > > #define _GNU_SOURCE > #include > #include > #include > #include > #include > #include > #include > #include > #include > #include > > static inline int do_procfd_send_signal(int procfd, int sig, siginfo_t *info, > unsigned int flags) > { > #ifdef __NR_procfd_send_signal > return syscall(__NR_procfd_send_signal, procfd, sig, info, flags); > #else > return -ENOSYS; > #endif > } > > int main(int argc, char *argv[]) > { > int fd, ret, saved_errno, sig; > > if (argc < 3) > exit(EXIT_FAILURE); > > fd = open(argv[1], O_DIRECTORY | O_CLOEXEC); > if (fd < 0) { > printf("%s - Failed to open \"%s\"\n", strerror(errno), > argv[1]); > exit(EXIT_FAILURE); > } > > sig = atoi(argv[2]); > > printf("Sending signal %d to process %s\n", sig, argv[1]); > ret = do_procfd_send_signal(fd, sig, NULL, 0); > >
Re: [PATCH 2/5] ARC: perf: introduce Kernel PMU events support
On 12/5/18 9:06 AM, Eugeniy Paltsev wrote: > Export all available ARC architected hardware events as > kernel PMU events to make non-generic events accessible. > > ARC PMU HW allow us to read the list of all available > events names. So we generate kernel PMU event list > dynamically in arc_pmu_device_probe() using > human-readable events names we got from HW instead of > using pre-defined events list. > > -->8-- > $ perf list > [snip] > arc_pmu/bdata64/ [Kernel PMU event] > arc_pmu/bdcstall/ [Kernel PMU event] > arc_pmu/bdslot/ [Kernel PMU event] > arc_pmu/bfbmp/[Kernel PMU event] > arc_pmu/bfirqex/ [Kernel PMU event] > arc_pmu/bflgstal/ [Kernel PMU event] > arc_pmu/bflush/ [Kernel PMU event] Lets call this pct iso pmu since pmu has more of power mgmt connotation. I know the code has pmu littered all over but atleast the user interface could be more intuitive. BTW this approach seems more user friendly and is different from Alexey's earlier stab at implementing raw events [1]. He didn't keep around any list (and relied on use rlooking in the PRM to find his interesting event of the day) and pass as *r*foobar. PeterZ at the time had some reservations with that which I never fully understood. [1] https://lore.kernel.org/patchwork/patch/568769/ > -->8-- > > Signed-off-by: Eugeniy Paltsev > --- > arch/arc/kernel/perf_event.c | 107 > ++- > 1 file changed, 106 insertions(+), 1 deletion(-) > > diff --git a/arch/arc/kernel/perf_event.c b/arch/arc/kernel/perf_event.c > index 811a07a2ca21..97b88b00c418 100644 > --- a/arch/arc/kernel/perf_event.c > +++ b/arch/arc/kernel/perf_event.c > @@ -22,12 +22,28 @@ > /* HW holds 8 symbols + one for null terminator */ > #define ARCPMU_EVENT_NAME_LEN9 > > +enum arc_pmu_attr_groups { > + ARCPMU_ATTR_GR_EVENTS, > + ARCPMU_ATTR_GR_FORMATS, > + ARCPMU_NR_ATTR_GR > +}; > + > +struct arc_pmu_raw_event_entry { > + char name[ARCPMU_EVENT_NAME_LEN]; > +}; > + > struct arc_pmu { > struct pmu pmu; > unsigned intirq; > int n_counters; > + int n_events; > u64 max_period; > int ev_hw_idx[PERF_COUNT_ARC_HW_MAX]; > + > + struct arc_pmu_raw_event_entry *raw_entry; > + struct attribute**attrs; > + struct perf_pmu_events_attr *attr; > + const struct attribute_group*attr_groups[ARCPMU_NR_ATTR_GR + 1]; > }; > > struct arc_pmu_cpu { > @@ -196,6 +212,17 @@ static int arc_pmu_event_init(struct perf_event *event) >(int)hwc->config, arc_pmu_ev_hw_map[ret]); > return 0; > > + case PERF_TYPE_RAW: > + if (event->attr.config >= arc_pmu->n_events) > + return -ENOENT; > + > + hwc->config |= event->attr.config; > + pr_debug("init raw event with idx %lld \'%s\'\n", > + event->attr.config, > + arc_pmu->raw_entry[event->attr.config].name); > + > + return 0; > + > default: > return -ENOENT; > } > @@ -446,6 +473,68 @@ static void arc_cpu_pmu_irq_init(void *data) > write_aux_reg(ARC_REG_PCT_INT_ACT, 0x); > } > > +/* Event field occupies the bottom 15 bits of our config field */ > +PMU_FORMAT_ATTR(event, "config:0-14"); > +static struct attribute *arc_pmu_format_attrs[] = { > + _attr_event.attr, > + NULL, > +}; > + > +static struct attribute_group arc_pmu_format_attr_gr = { > + .name = "format", > + .attrs = arc_pmu_format_attrs, > +}; > + > +static ssize_t > +arc_pmu_events_sysfs_show(struct device *dev, > + struct device_attribute *attr, char *page) > +{ > + struct perf_pmu_events_attr *pmu_attr; > + > + pmu_attr = container_of(attr, struct perf_pmu_events_attr, attr); > + return sprintf(page, "event=0x%04llx\n", pmu_attr->id); > +} > + > +/* > + * We don't add attrs here as we don't have pre-defined list of perf events. > + * We will generate and add attrs dynamically in probe() after we read HW > + * configuration. > + */ > +static struct attribute_group arc_pmu_events_attr_gr = { > + .name = "events", > +}; > + > +static void arc_pmu_add_raw_event_attr(int j, char *str) > +{ > + memmove(arc_pmu->raw_entry[j].name, str, ARCPMU_EVENT_NAME_LEN - 1); > + arc_pmu->attr[j].attr.attr.name = arc_pmu->raw_entry[j].name; > + arc_pmu->attr[j].attr.attr.mode = VERIFY_OCTAL_PERMISSIONS(0444); > + arc_pmu->attr[j].attr.show = arc_pmu_events_sysfs_show; > + arc_pmu->attr[j].id = j; > + arc_pmu->attrs[j] = &(arc_pmu->attr[j].attr.attr); > +} > + > +static int arc_pmu_raw_alloc(struct device *dev) > +{ > +
Re: [PATCH 2/5] ARC: perf: introduce Kernel PMU events support
On 12/5/18 9:06 AM, Eugeniy Paltsev wrote: > Export all available ARC architected hardware events as > kernel PMU events to make non-generic events accessible. > > ARC PMU HW allow us to read the list of all available > events names. So we generate kernel PMU event list > dynamically in arc_pmu_device_probe() using > human-readable events names we got from HW instead of > using pre-defined events list. > > -->8-- > $ perf list > [snip] > arc_pmu/bdata64/ [Kernel PMU event] > arc_pmu/bdcstall/ [Kernel PMU event] > arc_pmu/bdslot/ [Kernel PMU event] > arc_pmu/bfbmp/[Kernel PMU event] > arc_pmu/bfirqex/ [Kernel PMU event] > arc_pmu/bflgstal/ [Kernel PMU event] > arc_pmu/bflush/ [Kernel PMU event] Lets call this pct iso pmu since pmu has more of power mgmt connotation. I know the code has pmu littered all over but atleast the user interface could be more intuitive. BTW this approach seems more user friendly and is different from Alexey's earlier stab at implementing raw events [1]. He didn't keep around any list (and relied on use rlooking in the PRM to find his interesting event of the day) and pass as *r*foobar. PeterZ at the time had some reservations with that which I never fully understood. [1] https://lore.kernel.org/patchwork/patch/568769/ > -->8-- > > Signed-off-by: Eugeniy Paltsev > --- > arch/arc/kernel/perf_event.c | 107 > ++- > 1 file changed, 106 insertions(+), 1 deletion(-) > > diff --git a/arch/arc/kernel/perf_event.c b/arch/arc/kernel/perf_event.c > index 811a07a2ca21..97b88b00c418 100644 > --- a/arch/arc/kernel/perf_event.c > +++ b/arch/arc/kernel/perf_event.c > @@ -22,12 +22,28 @@ > /* HW holds 8 symbols + one for null terminator */ > #define ARCPMU_EVENT_NAME_LEN9 > > +enum arc_pmu_attr_groups { > + ARCPMU_ATTR_GR_EVENTS, > + ARCPMU_ATTR_GR_FORMATS, > + ARCPMU_NR_ATTR_GR > +}; > + > +struct arc_pmu_raw_event_entry { > + char name[ARCPMU_EVENT_NAME_LEN]; > +}; > + > struct arc_pmu { > struct pmu pmu; > unsigned intirq; > int n_counters; > + int n_events; > u64 max_period; > int ev_hw_idx[PERF_COUNT_ARC_HW_MAX]; > + > + struct arc_pmu_raw_event_entry *raw_entry; > + struct attribute**attrs; > + struct perf_pmu_events_attr *attr; > + const struct attribute_group*attr_groups[ARCPMU_NR_ATTR_GR + 1]; > }; > > struct arc_pmu_cpu { > @@ -196,6 +212,17 @@ static int arc_pmu_event_init(struct perf_event *event) >(int)hwc->config, arc_pmu_ev_hw_map[ret]); > return 0; > > + case PERF_TYPE_RAW: > + if (event->attr.config >= arc_pmu->n_events) > + return -ENOENT; > + > + hwc->config |= event->attr.config; > + pr_debug("init raw event with idx %lld \'%s\'\n", > + event->attr.config, > + arc_pmu->raw_entry[event->attr.config].name); > + > + return 0; > + > default: > return -ENOENT; > } > @@ -446,6 +473,68 @@ static void arc_cpu_pmu_irq_init(void *data) > write_aux_reg(ARC_REG_PCT_INT_ACT, 0x); > } > > +/* Event field occupies the bottom 15 bits of our config field */ > +PMU_FORMAT_ATTR(event, "config:0-14"); > +static struct attribute *arc_pmu_format_attrs[] = { > + _attr_event.attr, > + NULL, > +}; > + > +static struct attribute_group arc_pmu_format_attr_gr = { > + .name = "format", > + .attrs = arc_pmu_format_attrs, > +}; > + > +static ssize_t > +arc_pmu_events_sysfs_show(struct device *dev, > + struct device_attribute *attr, char *page) > +{ > + struct perf_pmu_events_attr *pmu_attr; > + > + pmu_attr = container_of(attr, struct perf_pmu_events_attr, attr); > + return sprintf(page, "event=0x%04llx\n", pmu_attr->id); > +} > + > +/* > + * We don't add attrs here as we don't have pre-defined list of perf events. > + * We will generate and add attrs dynamically in probe() after we read HW > + * configuration. > + */ > +static struct attribute_group arc_pmu_events_attr_gr = { > + .name = "events", > +}; > + > +static void arc_pmu_add_raw_event_attr(int j, char *str) > +{ > + memmove(arc_pmu->raw_entry[j].name, str, ARCPMU_EVENT_NAME_LEN - 1); > + arc_pmu->attr[j].attr.attr.name = arc_pmu->raw_entry[j].name; > + arc_pmu->attr[j].attr.attr.mode = VERIFY_OCTAL_PERMISSIONS(0444); > + arc_pmu->attr[j].attr.show = arc_pmu_events_sysfs_show; > + arc_pmu->attr[j].id = j; > + arc_pmu->attrs[j] = &(arc_pmu->attr[j].attr.attr); > +} > + > +static int arc_pmu_raw_alloc(struct device *dev) > +{ > +
Re: [PATCH v2 00/22] mfd: demodularization of non-modular drivers
[RE: [PATCH v2 00/22] mfd: demodularization of non-modular drivers] On 05/12/2018 (Wed 12:01) Steve Twiss wrote: > Hi Paul, > > On 03 December 2018 04:23, Paul Gortmaker wrote: > > > Subject: [PATCH v2 00/22] mfd: demodularization of non-modular drivers > > > > [v1 --> v2: add some more commits as requested by Lee (MFD maintainer), > > update the 00/NN text; re-do build and link testing on new linux-next. ] > > > > This group of MFD drivers are all controlled by "bool" Kconfig settings, > > but contain various amounts of largely pointless uses of infrastructure > > related to modular operations, even though they can't be built modular. > > > > We can easily remove/replace all of it. We are trying to make driver > > code consistent with the Makefiles/Kconfigs that control them. > > For the DA9052 and DA9055, changes: > > - drivers/mfd/da9052-core.c | 11 --- > - drivers/mfd/da9052-i2c.c | 22 ++--- > - drivers/mfd/da9052-irq.c | 1 - > - drivers/mfd/da9052-spi.c | 22 ++--- > - drivers/mfd/da9055-core.c | 13 ++--- > - drivers/mfd/da9055-i2c.c | 24 ++- > > The responsibility can fall back to Dialog for these changes. We will > submit Kconfig patches for these and make them explicitly non-modular. > This will remove the ambiguity caused by the Kconfig bool options. [typo noted: non-modular ---> modular] Great, I'll look forward to seeing those patches to convert these drivers to modular in the near future, and I'll not re-send mine. Thanks, P. -- > > Regards, > Steve > > > This > > means not using modular functions/macros for drivers that can never be > > built as a module. Some of the downfalls this oversight leads to are: > > > > (1) it is easy to accidentally write unused module_exit and remove code > > (2) it can be misleading when reading the source, thinking it can be > > modular when the Makefile and/or Kconfig prohibit it > > (3) it requires the include of the module.h header file which in turn > > includes nearly everything else, thus adding a lot of CPP overhead. > > (4) it gets copied/replicated into other drivers and spreads quickly. > > > > What we see in current drivers falls into one or more of the following > > categories: > > > > 1) include of when it simply isn't needed > > > > 2) Use of MODULE_LICENSE, MODULE_DEVICE_TABLE, MODULE_AUTHOR etc. > >macros that are no-ops for non-modular drivers. > > > > 3) Creation of a module_exit() function that will be compiled and > >linked in but never actually called for non-modular drivers. > > > > 4) Addition of a platform_driver ".remove" function that is bound > >into the struct but will never be called for non-module use cases. > > > > Solution to #1 --> #3 is simple ; we just delete the related code. > > > > The solution to #4 is similar - we delete the ".remove" function and > > the binding into the platform_driver struct. However, since the same > > ".remove" function could also be triggered by an "unbind" (such as for > > pass-through of a device to a guest instance) - so we also explicitly > > disable any unbind for the driver. > > > > The unbind mask allows us to ensure we will see if there was some odd > > corner case out there that was relying on it. Typically it would be a > > multi-port ethernet card passing a port through to a guest, so a > > sensible use case in MFD drivers seems highly unlikely. This same > > solution has already been used in multiple other mainline subsystems. > > > > Build testing was done on drivers/mfd for allyesconfig on x86_64, ARM > > and ARM-64 on a recent linux-next checkout. > > > > Paul. > > > > --- > > > > Cc: Arnd Bergmann > > Cc: Cory Maccarrone > > Cc: David Dajun Chen > > Cc: Dong Aisheng > > Cc: Eric Miao > > Cc: Graeme Gregory > > Cc: Guennadi Liakhovetski > > Cc: Haojian Zhuang > > Cc: Jin Park > > Cc: Jorge Eduardo Candelaria > > Cc: Laxman Dewangan > > Cc: Lee Jones > > Cc: Linus Walleij > > Cc: Mark Brown > > Cc: Mattias Nilsson > > Cc: Michael Hennerich > > Cc: Mike Rapoport > > Cc: Tony Lindgren > > Cc: Venu Byravarasu > > Cc: linux-o...@vger.kernel.org > > Cc: patc...@opensource.cirrus.com > > Cc: Support Opensource > > > > > > Paul Gortmaker (22): > > mfd: aat2870-core: Make it explicitly non-modular > > mfd: adp5520: Make it explicitly non-modular > > mfd: as3711: Make it explicitly non-modular > > mfd: da903x: Make it explicitly non-modular > > mfd: da9052-*: Make it explicitly non-modular > > mfd: da9055-i2c: Make it explicitly non-modular > > mfd: da9055-core: make it explicitly non-modular > > mfd: db8500-prcmu: drop unused MODULE_ tags from non-modular code > > mfd: htc-i2cpld: Make it explicitly non-modular > > mfd: max8925-core: drop unused MODULE_ tags from non-modular code > > mfd: rc5t583: Make it explicitly non-modular > > mfd: sta2x11: drop unused
Re: [PATCH v2 00/22] mfd: demodularization of non-modular drivers
[RE: [PATCH v2 00/22] mfd: demodularization of non-modular drivers] On 05/12/2018 (Wed 12:01) Steve Twiss wrote: > Hi Paul, > > On 03 December 2018 04:23, Paul Gortmaker wrote: > > > Subject: [PATCH v2 00/22] mfd: demodularization of non-modular drivers > > > > [v1 --> v2: add some more commits as requested by Lee (MFD maintainer), > > update the 00/NN text; re-do build and link testing on new linux-next. ] > > > > This group of MFD drivers are all controlled by "bool" Kconfig settings, > > but contain various amounts of largely pointless uses of infrastructure > > related to modular operations, even though they can't be built modular. > > > > We can easily remove/replace all of it. We are trying to make driver > > code consistent with the Makefiles/Kconfigs that control them. > > For the DA9052 and DA9055, changes: > > - drivers/mfd/da9052-core.c | 11 --- > - drivers/mfd/da9052-i2c.c | 22 ++--- > - drivers/mfd/da9052-irq.c | 1 - > - drivers/mfd/da9052-spi.c | 22 ++--- > - drivers/mfd/da9055-core.c | 13 ++--- > - drivers/mfd/da9055-i2c.c | 24 ++- > > The responsibility can fall back to Dialog for these changes. We will > submit Kconfig patches for these and make them explicitly non-modular. > This will remove the ambiguity caused by the Kconfig bool options. [typo noted: non-modular ---> modular] Great, I'll look forward to seeing those patches to convert these drivers to modular in the near future, and I'll not re-send mine. Thanks, P. -- > > Regards, > Steve > > > This > > means not using modular functions/macros for drivers that can never be > > built as a module. Some of the downfalls this oversight leads to are: > > > > (1) it is easy to accidentally write unused module_exit and remove code > > (2) it can be misleading when reading the source, thinking it can be > > modular when the Makefile and/or Kconfig prohibit it > > (3) it requires the include of the module.h header file which in turn > > includes nearly everything else, thus adding a lot of CPP overhead. > > (4) it gets copied/replicated into other drivers and spreads quickly. > > > > What we see in current drivers falls into one or more of the following > > categories: > > > > 1) include of when it simply isn't needed > > > > 2) Use of MODULE_LICENSE, MODULE_DEVICE_TABLE, MODULE_AUTHOR etc. > >macros that are no-ops for non-modular drivers. > > > > 3) Creation of a module_exit() function that will be compiled and > >linked in but never actually called for non-modular drivers. > > > > 4) Addition of a platform_driver ".remove" function that is bound > >into the struct but will never be called for non-module use cases. > > > > Solution to #1 --> #3 is simple ; we just delete the related code. > > > > The solution to #4 is similar - we delete the ".remove" function and > > the binding into the platform_driver struct. However, since the same > > ".remove" function could also be triggered by an "unbind" (such as for > > pass-through of a device to a guest instance) - so we also explicitly > > disable any unbind for the driver. > > > > The unbind mask allows us to ensure we will see if there was some odd > > corner case out there that was relying on it. Typically it would be a > > multi-port ethernet card passing a port through to a guest, so a > > sensible use case in MFD drivers seems highly unlikely. This same > > solution has already been used in multiple other mainline subsystems. > > > > Build testing was done on drivers/mfd for allyesconfig on x86_64, ARM > > and ARM-64 on a recent linux-next checkout. > > > > Paul. > > > > --- > > > > Cc: Arnd Bergmann > > Cc: Cory Maccarrone > > Cc: David Dajun Chen > > Cc: Dong Aisheng > > Cc: Eric Miao > > Cc: Graeme Gregory > > Cc: Guennadi Liakhovetski > > Cc: Haojian Zhuang > > Cc: Jin Park > > Cc: Jorge Eduardo Candelaria > > Cc: Laxman Dewangan > > Cc: Lee Jones > > Cc: Linus Walleij > > Cc: Mark Brown > > Cc: Mattias Nilsson > > Cc: Michael Hennerich > > Cc: Mike Rapoport > > Cc: Tony Lindgren > > Cc: Venu Byravarasu > > Cc: linux-o...@vger.kernel.org > > Cc: patc...@opensource.cirrus.com > > Cc: Support Opensource > > > > > > Paul Gortmaker (22): > > mfd: aat2870-core: Make it explicitly non-modular > > mfd: adp5520: Make it explicitly non-modular > > mfd: as3711: Make it explicitly non-modular > > mfd: da903x: Make it explicitly non-modular > > mfd: da9052-*: Make it explicitly non-modular > > mfd: da9055-i2c: Make it explicitly non-modular > > mfd: da9055-core: make it explicitly non-modular > > mfd: db8500-prcmu: drop unused MODULE_ tags from non-modular code > > mfd: htc-i2cpld: Make it explicitly non-modular > > mfd: max8925-core: drop unused MODULE_ tags from non-modular code > > mfd: rc5t583: Make it explicitly non-modular > > mfd: sta2x11: drop unused
Re: Re: Re: [PATCH] proc/sysctl: fix return error forproc_doulongvec_minmax
On Wed, Dec 05, 2018 at 03:10:07PM +0800, cheng.lin...@zte.com.cn wrote: > > On Mon, Dec 03, 2018 at 01:12:39PM +0800, cheng.lin...@zte.com.cn wrote: > > > >Cheng, thanks for the patch! > > > > > > > >On Fri, Nov 30, 2018 at 02:35:17PM +0800, Cheng Lin wrote: > > > >> If the number of input parameters is less than the total > > > >> parameters, an INVAL error will be returned. > > > > > > > >Do you mean EINVAL? > > > > > > > Yes, it's EINVAL. > > > > Please adjust the commit log. > > > > > >> This patch ensure no error returned in this condition, just > > > >> like other interfaces do. > > > > > > > >Have an actual example to reproduce? > > > > > > > >Luis > > > > > > > We use proc_doulongvec_minmax to pass up to two parameters with > > > kern_table. > > > e.g. > > { > > > .procname = "monitor_signals", > > > .data = _sigs, > > > .maxlen = 2*sizeof(unsigned long), > > > .mode = 0644, > > > .proc_handler = proc_doulongvec_minmax, > > > }, > > > > > > Reproduce: > > > When passing two parameters, it's work normal. But passing only one > > > parameter, an error "Invalid argument"(EINVAL) is returned. > > > [root@cl150 ~]# echo 1 2 > /proc/sys/kernel/monitor_signals > > > [root@cl150 ~]# cat /proc/sys/kernel/monitor_signals > > > 1 2 > > > [root@cl150 ~]# echo 3 > /proc/sys/kernel/monitor_signals > > > -bash: echo: write error: Invalid argument > > > [root@cl150 ~]# echo $? > > > 1 > > > [root@cl150 ~]# cat /proc/sys/kernel/monitor_signals > > > 3 2 > > > [root@cl150 ~]# > > > > > > The following is the result after apply this patch. No error is returned > > > when the number of input parameters is less than the total parameters. > > > [root@cl150 ~]# echo 1 2 > /proc/sys/kernel/monitor_signals > > > [root@cl150 ~]# cat /proc/sys/kernel/monitor_signals > > > 1 2 > > > [root@cl150 ~]# echo 3 > /proc/sys/kernel/monitor_signals > > > [root@cl150 ~]# echo $? > > > 0 > > > [root@cl150 ~]# cat /proc/sys/kernel/monitor_signals > > > 3 2 > > > [root@cl150 ~]# > > > > This would be good to have in the commit log as well. But your patch > > only addresses one of the proc users, there are a few other checks like > > this that would also need to be expanded for this. So please expand > > your patch to cover the other cases as well. > > I have done the check for the interfaces exported in kernel/sysctl.c. > EXPORT_SYMBOL(proc_dointvec); > EXPORT_SYMBOL(proc_douintvec); > EXPORT_SYMBOL(proc_dointvec_jiffies); > EXPORT_SYMBOL(proc_dointvec_minmax); > EXPORT_SYMBOL_GPL(proc_douintvec_minmax); > EXPORT_SYMBOL(proc_dointvec_userhz_jiffies); > EXPORT_SYMBOL(proc_dointvec_ms_jiffies); > EXPORT_SYMBOL(proc_dostring); > EXPORT_SYMBOL(proc_doulongvec_minmax); > EXPORT_SYMBOL(proc_doulongvec_ms_jiffies_minmax); > > The function call relationship is as follows. There are three processing > functions dealing with digital parameters, > __do_proc_dointvec/__do_proc_douintvec/__do_proc_doulongvec_minmax. > > proc_dointvec--| > proc_dointvec_jiffies--| > proc_dointvec_minmax--| > proc_dointvec_userhz_jiffies| > proc_dointvec_ms_jiffies-|-> do_proc_dointvec|-> > __do_proc_dointvec > > proc_douintvec-| > proc_douintvec_minmax-|-> do_proc_douintvec---|-> > __do_proc_douintvec > > proc_doulongvec_minmax---| > proc_doulongvec_ms_jiffies_minmax--|-> do_proc_doulongvec_minmax|-> > __do_proc_doulongvec_minmax OK > This patch deals with __do_proc_doulongvec_minmax, just as > __do_proc_dointvec does, adding a check for parameters 'left'. In > __do_proc_douintvec, its code implementation explicitly does not > support multiple inputs. static int __do_proc_douintvec(...){ > ... > /* > * Arrays are not supported, keep this simple. *Do not* add > * support for them. > */ > if (vleft != 1) { > *lenp = 0; > return -EINVAL; > ... > } > > > So, just __do_proc_doulongvec_minmax has the problem. And most use of > proc_doulongvec_minmax/proc_doulongvec_ms_jiffies_minmax just have one > parameter. The above text, up to my OK, is useful information for the commit log, please add that. > It's well hidden. You mean that the issue is not widely spread? If so please add that comment to the commit log, and resubmit a v2. Luis > Typical multi-parameter applications for > proc_dointvec, such as /proc/sys/kernel/printk.
Re: Re: Re: [PATCH] proc/sysctl: fix return error forproc_doulongvec_minmax
On Wed, Dec 05, 2018 at 03:10:07PM +0800, cheng.lin...@zte.com.cn wrote: > > On Mon, Dec 03, 2018 at 01:12:39PM +0800, cheng.lin...@zte.com.cn wrote: > > > >Cheng, thanks for the patch! > > > > > > > >On Fri, Nov 30, 2018 at 02:35:17PM +0800, Cheng Lin wrote: > > > >> If the number of input parameters is less than the total > > > >> parameters, an INVAL error will be returned. > > > > > > > >Do you mean EINVAL? > > > > > > > Yes, it's EINVAL. > > > > Please adjust the commit log. > > > > > >> This patch ensure no error returned in this condition, just > > > >> like other interfaces do. > > > > > > > >Have an actual example to reproduce? > > > > > > > >Luis > > > > > > > We use proc_doulongvec_minmax to pass up to two parameters with > > > kern_table. > > > e.g. > > { > > > .procname = "monitor_signals", > > > .data = _sigs, > > > .maxlen = 2*sizeof(unsigned long), > > > .mode = 0644, > > > .proc_handler = proc_doulongvec_minmax, > > > }, > > > > > > Reproduce: > > > When passing two parameters, it's work normal. But passing only one > > > parameter, an error "Invalid argument"(EINVAL) is returned. > > > [root@cl150 ~]# echo 1 2 > /proc/sys/kernel/monitor_signals > > > [root@cl150 ~]# cat /proc/sys/kernel/monitor_signals > > > 1 2 > > > [root@cl150 ~]# echo 3 > /proc/sys/kernel/monitor_signals > > > -bash: echo: write error: Invalid argument > > > [root@cl150 ~]# echo $? > > > 1 > > > [root@cl150 ~]# cat /proc/sys/kernel/monitor_signals > > > 3 2 > > > [root@cl150 ~]# > > > > > > The following is the result after apply this patch. No error is returned > > > when the number of input parameters is less than the total parameters. > > > [root@cl150 ~]# echo 1 2 > /proc/sys/kernel/monitor_signals > > > [root@cl150 ~]# cat /proc/sys/kernel/monitor_signals > > > 1 2 > > > [root@cl150 ~]# echo 3 > /proc/sys/kernel/monitor_signals > > > [root@cl150 ~]# echo $? > > > 0 > > > [root@cl150 ~]# cat /proc/sys/kernel/monitor_signals > > > 3 2 > > > [root@cl150 ~]# > > > > This would be good to have in the commit log as well. But your patch > > only addresses one of the proc users, there are a few other checks like > > this that would also need to be expanded for this. So please expand > > your patch to cover the other cases as well. > > I have done the check for the interfaces exported in kernel/sysctl.c. > EXPORT_SYMBOL(proc_dointvec); > EXPORT_SYMBOL(proc_douintvec); > EXPORT_SYMBOL(proc_dointvec_jiffies); > EXPORT_SYMBOL(proc_dointvec_minmax); > EXPORT_SYMBOL_GPL(proc_douintvec_minmax); > EXPORT_SYMBOL(proc_dointvec_userhz_jiffies); > EXPORT_SYMBOL(proc_dointvec_ms_jiffies); > EXPORT_SYMBOL(proc_dostring); > EXPORT_SYMBOL(proc_doulongvec_minmax); > EXPORT_SYMBOL(proc_doulongvec_ms_jiffies_minmax); > > The function call relationship is as follows. There are three processing > functions dealing with digital parameters, > __do_proc_dointvec/__do_proc_douintvec/__do_proc_doulongvec_minmax. > > proc_dointvec--| > proc_dointvec_jiffies--| > proc_dointvec_minmax--| > proc_dointvec_userhz_jiffies| > proc_dointvec_ms_jiffies-|-> do_proc_dointvec|-> > __do_proc_dointvec > > proc_douintvec-| > proc_douintvec_minmax-|-> do_proc_douintvec---|-> > __do_proc_douintvec > > proc_doulongvec_minmax---| > proc_doulongvec_ms_jiffies_minmax--|-> do_proc_doulongvec_minmax|-> > __do_proc_doulongvec_minmax OK > This patch deals with __do_proc_doulongvec_minmax, just as > __do_proc_dointvec does, adding a check for parameters 'left'. In > __do_proc_douintvec, its code implementation explicitly does not > support multiple inputs. static int __do_proc_douintvec(...){ > ... > /* > * Arrays are not supported, keep this simple. *Do not* add > * support for them. > */ > if (vleft != 1) { > *lenp = 0; > return -EINVAL; > ... > } > > > So, just __do_proc_doulongvec_minmax has the problem. And most use of > proc_doulongvec_minmax/proc_doulongvec_ms_jiffies_minmax just have one > parameter. The above text, up to my OK, is useful information for the commit log, please add that. > It's well hidden. You mean that the issue is not widely spread? If so please add that comment to the commit log, and resubmit a v2. Luis > Typical multi-parameter applications for > proc_dointvec, such as /proc/sys/kernel/printk.
Re: [RFC PATCH 02/14] mm/hms: heterogenenous memory system (HMS) documentation
On Wed, Dec 05, 2018 at 10:41:56AM -0700, Logan Gunthorpe wrote: > > > On 2018-12-04 7:31 p.m., Jerome Glisse wrote: > > How can i express multiple link, or memory that is only accessible > > by a subset of the devices/CPUs. In today model they are back in > > assumption like everyone can access all the node which do not hold > > in what i am trying to do. > > Well multiple links are easy when you have a 'link' bus. Just add > another link device under the bus. So you are telling do what i am doing in this patchset but not under HMS directory ? > > Technically, the accessibility issue is already encoded in sysfs. For > example, through the PCI tree you can determine which ACS bits are set > and determine which devices are behind the same root bridge the same way > we do in the kernel p2pdma subsystem. This is all bus specific which is > fine, but if we want to change that, we should have a common way for > existing buses to describe these attributes in the existing tree. The > new 'link' bus devices would have to have some way to describe cases if > memory isn't accessible in some way across it. What i am looking at is much more complex than just access bit. It is a whole set of properties attach to each path (can it be cache coherent ? can it do atomic ? what is the access granularity ? what is the bandwidth ? is it dedicated link ? ...) > > But really, I would say the kernel is responsible for telling you when > memory is accessible to a list of initiators, so it should be part of > the checks in a theoretical hbind api. This is already the approach > p2pdma takes in-kernel: we have functions that tell you if two PCI > devices can talk to each other and we have functions to give you memory > accessible by a set of devices. What we don't have is a special tree > that p2pdma users have to walk through to determine accessibility. You do not need it, but i do need it they are user out there that are already depending on the information by getting it through non standard way. I do want to provide a standard way for userspace to get this. They are real user out there and i believe their would be more user if we had a standard way to provide it. You do not believe in it fine. I will do more work in userspace and more example and i will come back with more hard evidence until i convince enough people. > > In my eye's, you are just conflating a bunch of different issues that > are better solved independently in the existing frameworks we have. And > if they were tackled individually, you'd have a much easier time getting > them merged one by one. I don't think i can convince you otherwise. They are user that use topology please looks at the links i provided, those folks have running program _today_ they rely on non standard API and would like to move toward standard API it would improve their life. On top of that i argue that more people would use that information if it were available to them. I agree that i have no hard evidence to back that up and that it is just a feeling but you can not disprove me either as this is a chicken and egg problem, you can not prove people will not use an API if the API is not there to be use. Cheers, Jérôme
Re: [RFC PATCH 02/14] mm/hms: heterogenenous memory system (HMS) documentation
On Wed, Dec 05, 2018 at 10:41:56AM -0700, Logan Gunthorpe wrote: > > > On 2018-12-04 7:31 p.m., Jerome Glisse wrote: > > How can i express multiple link, or memory that is only accessible > > by a subset of the devices/CPUs. In today model they are back in > > assumption like everyone can access all the node which do not hold > > in what i am trying to do. > > Well multiple links are easy when you have a 'link' bus. Just add > another link device under the bus. So you are telling do what i am doing in this patchset but not under HMS directory ? > > Technically, the accessibility issue is already encoded in sysfs. For > example, through the PCI tree you can determine which ACS bits are set > and determine which devices are behind the same root bridge the same way > we do in the kernel p2pdma subsystem. This is all bus specific which is > fine, but if we want to change that, we should have a common way for > existing buses to describe these attributes in the existing tree. The > new 'link' bus devices would have to have some way to describe cases if > memory isn't accessible in some way across it. What i am looking at is much more complex than just access bit. It is a whole set of properties attach to each path (can it be cache coherent ? can it do atomic ? what is the access granularity ? what is the bandwidth ? is it dedicated link ? ...) > > But really, I would say the kernel is responsible for telling you when > memory is accessible to a list of initiators, so it should be part of > the checks in a theoretical hbind api. This is already the approach > p2pdma takes in-kernel: we have functions that tell you if two PCI > devices can talk to each other and we have functions to give you memory > accessible by a set of devices. What we don't have is a special tree > that p2pdma users have to walk through to determine accessibility. You do not need it, but i do need it they are user out there that are already depending on the information by getting it through non standard way. I do want to provide a standard way for userspace to get this. They are real user out there and i believe their would be more user if we had a standard way to provide it. You do not believe in it fine. I will do more work in userspace and more example and i will come back with more hard evidence until i convince enough people. > > In my eye's, you are just conflating a bunch of different issues that > are better solved independently in the existing frameworks we have. And > if they were tackled individually, you'd have a much easier time getting > them merged one by one. I don't think i can convince you otherwise. They are user that use topology please looks at the links i provided, those folks have running program _today_ they rely on non standard API and would like to move toward standard API it would improve their life. On top of that i argue that more people would use that information if it were available to them. I agree that i have no hard evidence to back that up and that it is just a feeling but you can not disprove me either as this is a chicken and egg problem, you can not prove people will not use an API if the API is not there to be use. Cheers, Jérôme
[tip:x86/mm] x86/mm: Drop usage of __flush_tlb_all() in kernel_physical_mapping_init()
Commit-ID: ba6f508d0ec4adb09f0a939af6d5e19cdfa8667d Gitweb: https://git.kernel.org/tip/ba6f508d0ec4adb09f0a939af6d5e19cdfa8667d Author: Dan Williams AuthorDate: Tue, 4 Dec 2018 13:37:27 -0800 Committer: Ingo Molnar CommitDate: Wed, 5 Dec 2018 09:03:07 +0100 x86/mm: Drop usage of __flush_tlb_all() in kernel_physical_mapping_init() Commit: f77084d96355 "x86/mm/pat: Disable preemption around __flush_tlb_all()" addressed a case where __flush_tlb_all() is called without preemption being disabled. It also left a warning to catch other cases where preemption is not disabled. That warning triggers for the memory hotplug path which is also used for persistent memory enabling: WARNING: CPU: 35 PID: 911 at ./arch/x86/include/asm/tlbflush.h:460 RIP: 0010:__flush_tlb_all+0x1b/0x3a [..] Call Trace: phys_pud_init+0x29c/0x2bb kernel_physical_mapping_init+0xfc/0x219 init_memory_mapping+0x1a5/0x3b0 arch_add_memory+0x2c/0x50 devm_memremap_pages+0x3aa/0x610 pmem_attach_disk+0x585/0x700 [nd_pmem] Andy wondered why a path that can sleep was using __flush_tlb_all() [1] and Dave confirmed the expectation for TLB flush is for modifying / invalidating existing PTE entries, but not initial population [2]. Drop the usage of __flush_tlb_all() in phys_{p4d,pud,pmd}_init() on the expectation that this path is only ever populating empty entries for the linear map. Note, at linear map teardown time there is a call to the all-cpu flush_tlb_all() to invalidate the removed mappings. [1]: https://lkml.kernel.org/r/9dfd717d-857d-493d-a606-b635d72ba...@amacapital.net [2]: https://lkml.kernel.org/r/749919a4-cdb1-48a3-adb4-adb81a5fa...@intel.com [ mingo: Minor readability edits. ] Suggested-by: Dave Hansen Reported-by: Andy Lutomirski Signed-off-by: Dan Williams Acked-by: Peter Zijlstra (Intel) Acked-by: Kirill A. Shutemov Cc: Cc: Borislav Petkov Cc: H. Peter Anvin Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Rik van Riel Cc: Sebastian Andrzej Siewior Cc: Thomas Gleixner Cc: dave.han...@intel.com Fixes: f77084d96355 ("x86/mm/pat: Disable preemption around __flush_tlb_all()") Link: http://lkml.kernel.org/r/154395944713.32119.15611079023837132638.st...@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Ingo Molnar --- arch/x86/mm/init_64.c | 6 -- 1 file changed, 6 deletions(-) diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c index 3e25ac2793ef..484c1b92f078 100644 --- a/arch/x86/mm/init_64.c +++ b/arch/x86/mm/init_64.c @@ -584,7 +584,6 @@ phys_pud_init(pud_t *pud_page, unsigned long paddr, unsigned long paddr_end, paddr_end, page_size_mask, prot); - __flush_tlb_all(); continue; } /* @@ -627,7 +626,6 @@ phys_pud_init(pud_t *pud_page, unsigned long paddr, unsigned long paddr_end, pud_populate_safe(_mm, pud, pmd); spin_unlock(_mm.page_table_lock); } - __flush_tlb_all(); update_page_count(PG_LEVEL_1G, pages); @@ -668,7 +666,6 @@ phys_p4d_init(p4d_t *p4d_page, unsigned long paddr, unsigned long paddr_end, paddr_last = phys_pud_init(pud, paddr, paddr_end, page_size_mask); - __flush_tlb_all(); continue; } @@ -680,7 +677,6 @@ phys_p4d_init(p4d_t *p4d_page, unsigned long paddr, unsigned long paddr_end, p4d_populate_safe(_mm, p4d, pud); spin_unlock(_mm.page_table_lock); } - __flush_tlb_all(); return paddr_last; } @@ -733,8 +729,6 @@ kernel_physical_mapping_init(unsigned long paddr_start, if (pgd_changed) sync_global_pgds(vaddr_start, vaddr_end - 1); - __flush_tlb_all(); - return paddr_last; }
[tip:x86/mm] x86/mm: Drop usage of __flush_tlb_all() in kernel_physical_mapping_init()
Commit-ID: ba6f508d0ec4adb09f0a939af6d5e19cdfa8667d Gitweb: https://git.kernel.org/tip/ba6f508d0ec4adb09f0a939af6d5e19cdfa8667d Author: Dan Williams AuthorDate: Tue, 4 Dec 2018 13:37:27 -0800 Committer: Ingo Molnar CommitDate: Wed, 5 Dec 2018 09:03:07 +0100 x86/mm: Drop usage of __flush_tlb_all() in kernel_physical_mapping_init() Commit: f77084d96355 "x86/mm/pat: Disable preemption around __flush_tlb_all()" addressed a case where __flush_tlb_all() is called without preemption being disabled. It also left a warning to catch other cases where preemption is not disabled. That warning triggers for the memory hotplug path which is also used for persistent memory enabling: WARNING: CPU: 35 PID: 911 at ./arch/x86/include/asm/tlbflush.h:460 RIP: 0010:__flush_tlb_all+0x1b/0x3a [..] Call Trace: phys_pud_init+0x29c/0x2bb kernel_physical_mapping_init+0xfc/0x219 init_memory_mapping+0x1a5/0x3b0 arch_add_memory+0x2c/0x50 devm_memremap_pages+0x3aa/0x610 pmem_attach_disk+0x585/0x700 [nd_pmem] Andy wondered why a path that can sleep was using __flush_tlb_all() [1] and Dave confirmed the expectation for TLB flush is for modifying / invalidating existing PTE entries, but not initial population [2]. Drop the usage of __flush_tlb_all() in phys_{p4d,pud,pmd}_init() on the expectation that this path is only ever populating empty entries for the linear map. Note, at linear map teardown time there is a call to the all-cpu flush_tlb_all() to invalidate the removed mappings. [1]: https://lkml.kernel.org/r/9dfd717d-857d-493d-a606-b635d72ba...@amacapital.net [2]: https://lkml.kernel.org/r/749919a4-cdb1-48a3-adb4-adb81a5fa...@intel.com [ mingo: Minor readability edits. ] Suggested-by: Dave Hansen Reported-by: Andy Lutomirski Signed-off-by: Dan Williams Acked-by: Peter Zijlstra (Intel) Acked-by: Kirill A. Shutemov Cc: Cc: Borislav Petkov Cc: H. Peter Anvin Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Rik van Riel Cc: Sebastian Andrzej Siewior Cc: Thomas Gleixner Cc: dave.han...@intel.com Fixes: f77084d96355 ("x86/mm/pat: Disable preemption around __flush_tlb_all()") Link: http://lkml.kernel.org/r/154395944713.32119.15611079023837132638.st...@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Ingo Molnar --- arch/x86/mm/init_64.c | 6 -- 1 file changed, 6 deletions(-) diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c index 3e25ac2793ef..484c1b92f078 100644 --- a/arch/x86/mm/init_64.c +++ b/arch/x86/mm/init_64.c @@ -584,7 +584,6 @@ phys_pud_init(pud_t *pud_page, unsigned long paddr, unsigned long paddr_end, paddr_end, page_size_mask, prot); - __flush_tlb_all(); continue; } /* @@ -627,7 +626,6 @@ phys_pud_init(pud_t *pud_page, unsigned long paddr, unsigned long paddr_end, pud_populate_safe(_mm, pud, pmd); spin_unlock(_mm.page_table_lock); } - __flush_tlb_all(); update_page_count(PG_LEVEL_1G, pages); @@ -668,7 +666,6 @@ phys_p4d_init(p4d_t *p4d_page, unsigned long paddr, unsigned long paddr_end, paddr_last = phys_pud_init(pud, paddr, paddr_end, page_size_mask); - __flush_tlb_all(); continue; } @@ -680,7 +677,6 @@ phys_p4d_init(p4d_t *p4d_page, unsigned long paddr, unsigned long paddr_end, p4d_populate_safe(_mm, p4d, pud); spin_unlock(_mm.page_table_lock); } - __flush_tlb_all(); return paddr_last; } @@ -733,8 +729,6 @@ kernel_physical_mapping_init(unsigned long paddr_start, if (pgd_changed) sync_global_pgds(vaddr_start, vaddr_end - 1); - __flush_tlb_all(); - return paddr_last; }
[tip:x86/mm] generic/pgtable: Introduce set_pte_safe()
Commit-ID: 4369deaa2f022ef92da45a0e7eec8a4a52e8e8a4 Gitweb: https://git.kernel.org/tip/4369deaa2f022ef92da45a0e7eec8a4a52e8e8a4 Author: Dan Williams AuthorDate: Tue, 4 Dec 2018 13:37:16 -0800 Committer: Ingo Molnar CommitDate: Wed, 5 Dec 2018 09:03:06 +0100 generic/pgtable: Introduce set_pte_safe() Commit: f77084d96355 "x86/mm/pat: Disable preemption around __flush_tlb_all()" introduced a warning to capture cases __flush_tlb_all() is called without pre-emption disabled. It triggers a false positive warning in the memory hotplug path. On investigation it was found that the __flush_tlb_all() calls are not necessary. However, they are only "not necessary" in practice provided the ptes are being initially populated from the !present state. Introduce set_pte_safe() as a sanity check that the pte is being updated in a way that does not require a TLB flush. Forgive the macro, the availability of the various of set_pte() levels is hit and miss across architectures. [ mingo: Minor readability edits. ] Suggested-by: Peter Zijlstra Suggested-by: Dave Hansen Signed-off-by: Dan Williams Acked-by: Peter Zijlstra (Intel) Acked-by: Kirill A. Shutemov Cc: Andy Lutomirski Cc: Borislav Petkov Cc: Dave Hansen Cc: H. Peter Anvin Cc: Linus Torvalds Cc: Rik van Riel Cc: Sebastian Andrzej Siewior Cc: Thomas Gleixner Link: https://lkml.kernel.org/r/279dadae-9148-465c-7ec6-3f37e026c...@intel.com Signed-off-by: Ingo Molnar --- include/asm-generic/pgtable.h | 38 ++ 1 file changed, 38 insertions(+) diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h index dae7f98babed..a9cac82e9a7a 100644 --- a/include/asm-generic/pgtable.h +++ b/include/asm-generic/pgtable.h @@ -400,6 +400,44 @@ static inline int pgd_same(pgd_t pgd_a, pgd_t pgd_b) } #endif +/* + * Use set_p*_safe(), and elide TLB flushing, when confident that *no* + * TLB flush will be required as a result of the "set". For example, use + * in scenarios where it is known ahead of time that the routine is + * setting non-present entries, or re-setting an existing entry to the + * same value. Otherwise, use the typical "set" helpers and flush the + * TLB. + */ +#define set_pte_safe(ptep, pte) \ +({ \ + WARN_ON_ONCE(pte_present(*ptep) && !pte_same(*ptep, pte)); \ + set_pte(ptep, pte); \ +}) + +#define set_pmd_safe(pmdp, pmd) \ +({ \ + WARN_ON_ONCE(pmd_present(*pmdp) && !pmd_same(*pmdp, pmd)); \ + set_pmd(pmdp, pmd); \ +}) + +#define set_pud_safe(pudp, pud) \ +({ \ + WARN_ON_ONCE(pud_present(*pudp) && !pud_same(*pudp, pud)); \ + set_pud(pudp, pud); \ +}) + +#define set_p4d_safe(p4dp, p4d) \ +({ \ + WARN_ON_ONCE(p4d_present(*p4dp) && !p4d_same(*p4dp, p4d)); \ + set_p4d(p4dp, p4d); \ +}) + +#define set_pgd_safe(pgdp, pgd) \ +({ \ + WARN_ON_ONCE(pgd_present(*pgdp) && !pgd_same(*pgdp, pgd)); \ + set_pgd(pgdp, pgd); \ +}) + #ifndef __HAVE_ARCH_DO_SWAP_PAGE /* * Some architectures support metadata associated with a page. When a
[tip:x86/mm] generic/pgtable: Introduce set_pte_safe()
Commit-ID: 4369deaa2f022ef92da45a0e7eec8a4a52e8e8a4 Gitweb: https://git.kernel.org/tip/4369deaa2f022ef92da45a0e7eec8a4a52e8e8a4 Author: Dan Williams AuthorDate: Tue, 4 Dec 2018 13:37:16 -0800 Committer: Ingo Molnar CommitDate: Wed, 5 Dec 2018 09:03:06 +0100 generic/pgtable: Introduce set_pte_safe() Commit: f77084d96355 "x86/mm/pat: Disable preemption around __flush_tlb_all()" introduced a warning to capture cases __flush_tlb_all() is called without pre-emption disabled. It triggers a false positive warning in the memory hotplug path. On investigation it was found that the __flush_tlb_all() calls are not necessary. However, they are only "not necessary" in practice provided the ptes are being initially populated from the !present state. Introduce set_pte_safe() as a sanity check that the pte is being updated in a way that does not require a TLB flush. Forgive the macro, the availability of the various of set_pte() levels is hit and miss across architectures. [ mingo: Minor readability edits. ] Suggested-by: Peter Zijlstra Suggested-by: Dave Hansen Signed-off-by: Dan Williams Acked-by: Peter Zijlstra (Intel) Acked-by: Kirill A. Shutemov Cc: Andy Lutomirski Cc: Borislav Petkov Cc: Dave Hansen Cc: H. Peter Anvin Cc: Linus Torvalds Cc: Rik van Riel Cc: Sebastian Andrzej Siewior Cc: Thomas Gleixner Link: https://lkml.kernel.org/r/279dadae-9148-465c-7ec6-3f37e026c...@intel.com Signed-off-by: Ingo Molnar --- include/asm-generic/pgtable.h | 38 ++ 1 file changed, 38 insertions(+) diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h index dae7f98babed..a9cac82e9a7a 100644 --- a/include/asm-generic/pgtable.h +++ b/include/asm-generic/pgtable.h @@ -400,6 +400,44 @@ static inline int pgd_same(pgd_t pgd_a, pgd_t pgd_b) } #endif +/* + * Use set_p*_safe(), and elide TLB flushing, when confident that *no* + * TLB flush will be required as a result of the "set". For example, use + * in scenarios where it is known ahead of time that the routine is + * setting non-present entries, or re-setting an existing entry to the + * same value. Otherwise, use the typical "set" helpers and flush the + * TLB. + */ +#define set_pte_safe(ptep, pte) \ +({ \ + WARN_ON_ONCE(pte_present(*ptep) && !pte_same(*ptep, pte)); \ + set_pte(ptep, pte); \ +}) + +#define set_pmd_safe(pmdp, pmd) \ +({ \ + WARN_ON_ONCE(pmd_present(*pmdp) && !pmd_same(*pmdp, pmd)); \ + set_pmd(pmdp, pmd); \ +}) + +#define set_pud_safe(pudp, pud) \ +({ \ + WARN_ON_ONCE(pud_present(*pudp) && !pud_same(*pudp, pud)); \ + set_pud(pudp, pud); \ +}) + +#define set_p4d_safe(p4dp, p4d) \ +({ \ + WARN_ON_ONCE(p4d_present(*p4dp) && !p4d_same(*p4dp, p4d)); \ + set_p4d(p4dp, p4d); \ +}) + +#define set_pgd_safe(pgdp, pgd) \ +({ \ + WARN_ON_ONCE(pgd_present(*pgdp) && !pgd_same(*pgdp, pgd)); \ + set_pgd(pgdp, pgd); \ +}) + #ifndef __HAVE_ARCH_DO_SWAP_PAGE /* * Some architectures support metadata associated with a page. When a
[tip:x86/mm] x86/mm: Validate kernel_physical_mapping_init() PTE population
Commit-ID: 0a9fe8ca844d43f3f547f0e166122b6048121c8f Gitweb: https://git.kernel.org/tip/0a9fe8ca844d43f3f547f0e166122b6048121c8f Author: Dan Williams AuthorDate: Tue, 4 Dec 2018 13:37:21 -0800 Committer: Ingo Molnar CommitDate: Wed, 5 Dec 2018 09:03:06 +0100 x86/mm: Validate kernel_physical_mapping_init() PTE population The usage of __flush_tlb_all() in the kernel_physical_mapping_init() path is not necessary. In general flushing the TLB is not required when updating an entry from the !present state. However, to give confidence in the future removal of TLB flushing in this path, use the new set_pte_safe() family of helpers to assert that the !present assumption is true in this path. [ mingo: Minor readability edits. ] Suggested-by: Peter Zijlstra Suggested-by: Dave Hansen Signed-off-by: Dan Williams Acked-by: Kirill A. Shutemov Acked-by: Peter Zijlstra (Intel) Cc: Andy Lutomirski Cc: Borislav Petkov Cc: Dave Hansen Cc: H. Peter Anvin Cc: Linus Torvalds Cc: Rik van Riel Cc: Sebastian Andrzej Siewior Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/154395944177.32119.8524957429632012270.st...@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Ingo Molnar --- arch/x86/include/asm/pgalloc.h | 27 +++ arch/x86/mm/init_64.c| 24 include/asm-generic/5level-fixup.h | 1 + include/asm-generic/pgtable-nop4d-hack.h | 1 + include/asm-generic/pgtable-nop4d.h | 1 + include/asm-generic/pgtable-nopud.h | 1 + 6 files changed, 43 insertions(+), 12 deletions(-) diff --git a/arch/x86/include/asm/pgalloc.h b/arch/x86/include/asm/pgalloc.h index ec7f43327033..1ea41aaef68b 100644 --- a/arch/x86/include/asm/pgalloc.h +++ b/arch/x86/include/asm/pgalloc.h @@ -80,6 +80,13 @@ static inline void pmd_populate_kernel(struct mm_struct *mm, set_pmd(pmd, __pmd(__pa(pte) | _PAGE_TABLE)); } +static inline void pmd_populate_kernel_safe(struct mm_struct *mm, + pmd_t *pmd, pte_t *pte) +{ + paravirt_alloc_pte(mm, __pa(pte) >> PAGE_SHIFT); + set_pmd_safe(pmd, __pmd(__pa(pte) | _PAGE_TABLE)); +} + static inline void pmd_populate(struct mm_struct *mm, pmd_t *pmd, struct page *pte) { @@ -132,6 +139,12 @@ static inline void pud_populate(struct mm_struct *mm, pud_t *pud, pmd_t *pmd) paravirt_alloc_pmd(mm, __pa(pmd) >> PAGE_SHIFT); set_pud(pud, __pud(_PAGE_TABLE | __pa(pmd))); } + +static inline void pud_populate_safe(struct mm_struct *mm, pud_t *pud, pmd_t *pmd) +{ + paravirt_alloc_pmd(mm, __pa(pmd) >> PAGE_SHIFT); + set_pud_safe(pud, __pud(_PAGE_TABLE | __pa(pmd))); +} #endif /* CONFIG_X86_PAE */ #if CONFIG_PGTABLE_LEVELS > 3 @@ -141,6 +154,12 @@ static inline void p4d_populate(struct mm_struct *mm, p4d_t *p4d, pud_t *pud) set_p4d(p4d, __p4d(_PAGE_TABLE | __pa(pud))); } +static inline void p4d_populate_safe(struct mm_struct *mm, p4d_t *p4d, pud_t *pud) +{ + paravirt_alloc_pud(mm, __pa(pud) >> PAGE_SHIFT); + set_p4d_safe(p4d, __p4d(_PAGE_TABLE | __pa(pud))); +} + static inline pud_t *pud_alloc_one(struct mm_struct *mm, unsigned long addr) { gfp_t gfp = GFP_KERNEL_ACCOUNT; @@ -173,6 +192,14 @@ static inline void pgd_populate(struct mm_struct *mm, pgd_t *pgd, p4d_t *p4d) set_pgd(pgd, __pgd(_PAGE_TABLE | __pa(p4d))); } +static inline void pgd_populate_safe(struct mm_struct *mm, pgd_t *pgd, p4d_t *p4d) +{ + if (!pgtable_l5_enabled()) + return; + paravirt_alloc_p4d(mm, __pa(p4d) >> PAGE_SHIFT); + set_pgd_safe(pgd, __pgd(_PAGE_TABLE | __pa(p4d))); +} + static inline p4d_t *p4d_alloc_one(struct mm_struct *mm, unsigned long addr) { gfp_t gfp = GFP_KERNEL_ACCOUNT; diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c index 5fab264948c2..3e25ac2793ef 100644 --- a/arch/x86/mm/init_64.c +++ b/arch/x86/mm/init_64.c @@ -432,7 +432,7 @@ phys_pte_init(pte_t *pte_page, unsigned long paddr, unsigned long paddr_end, E820_TYPE_RAM) && !e820__mapped_any(paddr & PAGE_MASK, paddr_next, E820_TYPE_RESERVED_KERN)) - set_pte(pte, __pte(0)); + set_pte_safe(pte, __pte(0)); continue; } @@ -452,7 +452,7 @@ phys_pte_init(pte_t *pte_page, unsigned long paddr, unsigned long paddr_end, pr_info(" pte=%p addr=%lx pte=%016lx\n", pte, paddr, pfn_pte(paddr >> PAGE_SHIFT, PAGE_KERNEL).pte); pages++; - set_pte(pte, pfn_pte(paddr >> PAGE_SHIFT, prot)); + set_pte_safe(pte, pfn_pte(paddr >> PAGE_SHIFT, prot)); paddr_last = (paddr & PAGE_MASK) + PAGE_SIZE; } @@ -487,7 +487,7 @@ phys_pmd_init(pmd_t
[tip:x86/mm] x86/mm: Validate kernel_physical_mapping_init() PTE population
Commit-ID: 0a9fe8ca844d43f3f547f0e166122b6048121c8f Gitweb: https://git.kernel.org/tip/0a9fe8ca844d43f3f547f0e166122b6048121c8f Author: Dan Williams AuthorDate: Tue, 4 Dec 2018 13:37:21 -0800 Committer: Ingo Molnar CommitDate: Wed, 5 Dec 2018 09:03:06 +0100 x86/mm: Validate kernel_physical_mapping_init() PTE population The usage of __flush_tlb_all() in the kernel_physical_mapping_init() path is not necessary. In general flushing the TLB is not required when updating an entry from the !present state. However, to give confidence in the future removal of TLB flushing in this path, use the new set_pte_safe() family of helpers to assert that the !present assumption is true in this path. [ mingo: Minor readability edits. ] Suggested-by: Peter Zijlstra Suggested-by: Dave Hansen Signed-off-by: Dan Williams Acked-by: Kirill A. Shutemov Acked-by: Peter Zijlstra (Intel) Cc: Andy Lutomirski Cc: Borislav Petkov Cc: Dave Hansen Cc: H. Peter Anvin Cc: Linus Torvalds Cc: Rik van Riel Cc: Sebastian Andrzej Siewior Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/154395944177.32119.8524957429632012270.st...@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Ingo Molnar --- arch/x86/include/asm/pgalloc.h | 27 +++ arch/x86/mm/init_64.c| 24 include/asm-generic/5level-fixup.h | 1 + include/asm-generic/pgtable-nop4d-hack.h | 1 + include/asm-generic/pgtable-nop4d.h | 1 + include/asm-generic/pgtable-nopud.h | 1 + 6 files changed, 43 insertions(+), 12 deletions(-) diff --git a/arch/x86/include/asm/pgalloc.h b/arch/x86/include/asm/pgalloc.h index ec7f43327033..1ea41aaef68b 100644 --- a/arch/x86/include/asm/pgalloc.h +++ b/arch/x86/include/asm/pgalloc.h @@ -80,6 +80,13 @@ static inline void pmd_populate_kernel(struct mm_struct *mm, set_pmd(pmd, __pmd(__pa(pte) | _PAGE_TABLE)); } +static inline void pmd_populate_kernel_safe(struct mm_struct *mm, + pmd_t *pmd, pte_t *pte) +{ + paravirt_alloc_pte(mm, __pa(pte) >> PAGE_SHIFT); + set_pmd_safe(pmd, __pmd(__pa(pte) | _PAGE_TABLE)); +} + static inline void pmd_populate(struct mm_struct *mm, pmd_t *pmd, struct page *pte) { @@ -132,6 +139,12 @@ static inline void pud_populate(struct mm_struct *mm, pud_t *pud, pmd_t *pmd) paravirt_alloc_pmd(mm, __pa(pmd) >> PAGE_SHIFT); set_pud(pud, __pud(_PAGE_TABLE | __pa(pmd))); } + +static inline void pud_populate_safe(struct mm_struct *mm, pud_t *pud, pmd_t *pmd) +{ + paravirt_alloc_pmd(mm, __pa(pmd) >> PAGE_SHIFT); + set_pud_safe(pud, __pud(_PAGE_TABLE | __pa(pmd))); +} #endif /* CONFIG_X86_PAE */ #if CONFIG_PGTABLE_LEVELS > 3 @@ -141,6 +154,12 @@ static inline void p4d_populate(struct mm_struct *mm, p4d_t *p4d, pud_t *pud) set_p4d(p4d, __p4d(_PAGE_TABLE | __pa(pud))); } +static inline void p4d_populate_safe(struct mm_struct *mm, p4d_t *p4d, pud_t *pud) +{ + paravirt_alloc_pud(mm, __pa(pud) >> PAGE_SHIFT); + set_p4d_safe(p4d, __p4d(_PAGE_TABLE | __pa(pud))); +} + static inline pud_t *pud_alloc_one(struct mm_struct *mm, unsigned long addr) { gfp_t gfp = GFP_KERNEL_ACCOUNT; @@ -173,6 +192,14 @@ static inline void pgd_populate(struct mm_struct *mm, pgd_t *pgd, p4d_t *p4d) set_pgd(pgd, __pgd(_PAGE_TABLE | __pa(p4d))); } +static inline void pgd_populate_safe(struct mm_struct *mm, pgd_t *pgd, p4d_t *p4d) +{ + if (!pgtable_l5_enabled()) + return; + paravirt_alloc_p4d(mm, __pa(p4d) >> PAGE_SHIFT); + set_pgd_safe(pgd, __pgd(_PAGE_TABLE | __pa(p4d))); +} + static inline p4d_t *p4d_alloc_one(struct mm_struct *mm, unsigned long addr) { gfp_t gfp = GFP_KERNEL_ACCOUNT; diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c index 5fab264948c2..3e25ac2793ef 100644 --- a/arch/x86/mm/init_64.c +++ b/arch/x86/mm/init_64.c @@ -432,7 +432,7 @@ phys_pte_init(pte_t *pte_page, unsigned long paddr, unsigned long paddr_end, E820_TYPE_RAM) && !e820__mapped_any(paddr & PAGE_MASK, paddr_next, E820_TYPE_RESERVED_KERN)) - set_pte(pte, __pte(0)); + set_pte_safe(pte, __pte(0)); continue; } @@ -452,7 +452,7 @@ phys_pte_init(pte_t *pte_page, unsigned long paddr, unsigned long paddr_end, pr_info(" pte=%p addr=%lx pte=%016lx\n", pte, paddr, pfn_pte(paddr >> PAGE_SHIFT, PAGE_KERNEL).pte); pages++; - set_pte(pte, pfn_pte(paddr >> PAGE_SHIFT, prot)); + set_pte_safe(pte, pfn_pte(paddr >> PAGE_SHIFT, prot)); paddr_last = (paddr & PAGE_MASK) + PAGE_SIZE; } @@ -487,7 +487,7 @@ phys_pmd_init(pmd_t
[tip:x86/asm] x86/vdso: Remove a stale/misleading comment from the linker script
Commit-ID: 29434801e7e9c6d05fbea4533b3c0bd6be612f62 Gitweb: https://git.kernel.org/tip/29434801e7e9c6d05fbea4533b3c0bd6be612f62 Author: Sean Christopherson AuthorDate: Tue, 4 Dec 2018 13:25:58 -0800 Committer: Ingo Molnar CommitDate: Wed, 5 Dec 2018 08:58:12 +0100 x86/vdso: Remove a stale/misleading comment from the linker script Once upon a time, vdso2c aggressively stripped data from the vDSO image when generating the final userspace image. This included stripping the .altinstructions and .altinstr_replacement sections. Eventually, the stripping process reverted to "objdump -S" and no longer removed the aforementioned sections, but the comment remained. Keeping the .alt* sections at the end of the PT_LOAD segment is no longer necessary, but there's no harm in doing so and it's a helpful reminder that they don't need to be included in the final vDSO image, i.e. someone may want to take another stab at zapping/stripping the unneeded sections. Signed-off-by: Sean Christopherson Acked-by: Andy Lutomirski Cc: Borislav Petkov Cc: Brian Gerst Cc: Dave Hansen Cc: Denys Vlasenko Cc: H. Peter Anvin Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Rik van Riel Cc: Thomas Gleixner Fixes: da861e18eccc ("x86, vdso: Get rid of the fake section mechanism") Link: http://lkml.kernel.org/r/20181204212600.28090-3-sean.j.christopher...@intel.com Signed-off-by: Ingo Molnar --- arch/x86/entry/vdso/vdso-layout.lds.S | 5 - 1 file changed, 5 deletions(-) diff --git a/arch/x86/entry/vdso/vdso-layout.lds.S b/arch/x86/entry/vdso/vdso-layout.lds.S index 0cedc905c8d6..93c6dc7812d0 100644 --- a/arch/x86/entry/vdso/vdso-layout.lds.S +++ b/arch/x86/entry/vdso/vdso-layout.lds.S @@ -65,11 +65,6 @@ SECTIONS .text : { *(.text*) } :text =0x90909090, - /* -* At the end so that eu-elflint stays happy when vdso2c strips -* these. A better implementation would avoid allocating space -* for these. -*/ .altinstructions: { *(.altinstructions) } :text .altinstr_replacement : { *(.altinstr_replacement) } :text
[tip:x86/asm] x86/vdso: Remove a stale/misleading comment from the linker script
Commit-ID: 29434801e7e9c6d05fbea4533b3c0bd6be612f62 Gitweb: https://git.kernel.org/tip/29434801e7e9c6d05fbea4533b3c0bd6be612f62 Author: Sean Christopherson AuthorDate: Tue, 4 Dec 2018 13:25:58 -0800 Committer: Ingo Molnar CommitDate: Wed, 5 Dec 2018 08:58:12 +0100 x86/vdso: Remove a stale/misleading comment from the linker script Once upon a time, vdso2c aggressively stripped data from the vDSO image when generating the final userspace image. This included stripping the .altinstructions and .altinstr_replacement sections. Eventually, the stripping process reverted to "objdump -S" and no longer removed the aforementioned sections, but the comment remained. Keeping the .alt* sections at the end of the PT_LOAD segment is no longer necessary, but there's no harm in doing so and it's a helpful reminder that they don't need to be included in the final vDSO image, i.e. someone may want to take another stab at zapping/stripping the unneeded sections. Signed-off-by: Sean Christopherson Acked-by: Andy Lutomirski Cc: Borislav Petkov Cc: Brian Gerst Cc: Dave Hansen Cc: Denys Vlasenko Cc: H. Peter Anvin Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Rik van Riel Cc: Thomas Gleixner Fixes: da861e18eccc ("x86, vdso: Get rid of the fake section mechanism") Link: http://lkml.kernel.org/r/20181204212600.28090-3-sean.j.christopher...@intel.com Signed-off-by: Ingo Molnar --- arch/x86/entry/vdso/vdso-layout.lds.S | 5 - 1 file changed, 5 deletions(-) diff --git a/arch/x86/entry/vdso/vdso-layout.lds.S b/arch/x86/entry/vdso/vdso-layout.lds.S index 0cedc905c8d6..93c6dc7812d0 100644 --- a/arch/x86/entry/vdso/vdso-layout.lds.S +++ b/arch/x86/entry/vdso/vdso-layout.lds.S @@ -65,11 +65,6 @@ SECTIONS .text : { *(.text*) } :text =0x90909090, - /* -* At the end so that eu-elflint stays happy when vdso2c strips -* these. A better implementation would avoid allocating space -* for these. -*/ .altinstructions: { *(.altinstructions) } :text .altinstr_replacement : { *(.altinstr_replacement) } :text
[tip:x86/mm] generic/pgtable: Introduce {p4d,pgd}_same()
Commit-ID: 0cebbb60f759a709dabb3c87b9704f9844878850 Gitweb: https://git.kernel.org/tip/0cebbb60f759a709dabb3c87b9704f9844878850 Author: Dan Williams AuthorDate: Tue, 4 Dec 2018 13:37:11 -0800 Committer: Ingo Molnar CommitDate: Wed, 5 Dec 2018 09:03:06 +0100 generic/pgtable: Introduce {p4d,pgd}_same() In preparation for introducing '_safe' versions of page table entry 'set' helpers, introduce generic versions of p4d_same() and pgd_same(). Signed-off-by: Dan Williams Acked-by: Kirill A. Shutemov Cc: Andy Lutomirski Cc: Borislav Petkov Cc: Dave Hansen Cc: H. Peter Anvin Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Rik van Riel Cc: Sebastian Andrzej Siewior Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/154395943153.32119.1733586547359626311.st...@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Ingo Molnar --- include/asm-generic/pgtable.h | 14 ++ 1 file changed, 14 insertions(+) diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h index eea50ef8b8cd..dae7f98babed 100644 --- a/include/asm-generic/pgtable.h +++ b/include/asm-generic/pgtable.h @@ -386,6 +386,20 @@ static inline int pud_same(pud_t pud_a, pud_t pud_b) } #endif +#ifndef __HAVE_ARCH_P4D_SAME +static inline int p4d_same(p4d_t p4d_a, p4d_t p4d_b) +{ + return p4d_val(p4d_a) == p4d_val(p4d_b); +} +#endif + +#ifndef __HAVE_ARCH_PGD_SAME +static inline int pgd_same(pgd_t pgd_a, pgd_t pgd_b) +{ + return pgd_val(pgd_a) == pgd_val(pgd_b); +} +#endif + #ifndef __HAVE_ARCH_DO_SWAP_PAGE /* * Some architectures support metadata associated with a page. When a
[tip:x86/mm] generic/pgtable: Introduce {p4d,pgd}_same()
Commit-ID: 0cebbb60f759a709dabb3c87b9704f9844878850 Gitweb: https://git.kernel.org/tip/0cebbb60f759a709dabb3c87b9704f9844878850 Author: Dan Williams AuthorDate: Tue, 4 Dec 2018 13:37:11 -0800 Committer: Ingo Molnar CommitDate: Wed, 5 Dec 2018 09:03:06 +0100 generic/pgtable: Introduce {p4d,pgd}_same() In preparation for introducing '_safe' versions of page table entry 'set' helpers, introduce generic versions of p4d_same() and pgd_same(). Signed-off-by: Dan Williams Acked-by: Kirill A. Shutemov Cc: Andy Lutomirski Cc: Borislav Petkov Cc: Dave Hansen Cc: H. Peter Anvin Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Rik van Riel Cc: Sebastian Andrzej Siewior Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/154395943153.32119.1733586547359626311.st...@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Ingo Molnar --- include/asm-generic/pgtable.h | 14 ++ 1 file changed, 14 insertions(+) diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h index eea50ef8b8cd..dae7f98babed 100644 --- a/include/asm-generic/pgtable.h +++ b/include/asm-generic/pgtable.h @@ -386,6 +386,20 @@ static inline int pud_same(pud_t pud_a, pud_t pud_b) } #endif +#ifndef __HAVE_ARCH_P4D_SAME +static inline int p4d_same(p4d_t p4d_a, p4d_t p4d_b) +{ + return p4d_val(p4d_a) == p4d_val(p4d_b); +} +#endif + +#ifndef __HAVE_ARCH_PGD_SAME +static inline int pgd_same(pgd_t pgd_a, pgd_t pgd_b) +{ + return pgd_val(pgd_a) == pgd_val(pgd_b); +} +#endif + #ifndef __HAVE_ARCH_DO_SWAP_PAGE /* * Some architectures support metadata associated with a page. When a
[tip:x86/mm] generic/pgtable: Make {pmd, pud}_same() unconditionally available
Commit-ID: c683c37cd13246941924c48f6c6a9863425e0cec Gitweb: https://git.kernel.org/tip/c683c37cd13246941924c48f6c6a9863425e0cec Author: Dan Williams AuthorDate: Tue, 4 Dec 2018 13:37:06 -0800 Committer: Ingo Molnar CommitDate: Wed, 5 Dec 2018 09:03:05 +0100 generic/pgtable: Make {pmd, pud}_same() unconditionally available In preparation for {pmd,pud}_same() to be used outside of transparent huge page code paths, make them unconditionally available. This enables them to be used in the definition of a new family of set_{pte,pmd,pud,p4d,pgd}_safe() helpers. Signed-off-by: Dan Williams Acked-by: Kirill A. Shutemov Cc: Andy Lutomirski Cc: Borislav Petkov Cc: Dave Hansen Cc: H. Peter Anvin Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Rik van Riel Cc: Sebastian Andrzej Siewior Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/154395942644.32119.10238934183949418128.st...@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Ingo Molnar --- include/asm-generic/pgtable.h | 14 -- 1 file changed, 14 deletions(-) diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h index 359fb935ded6..eea50ef8b8cd 100644 --- a/include/asm-generic/pgtable.h +++ b/include/asm-generic/pgtable.h @@ -375,7 +375,6 @@ static inline int pte_unused(pte_t pte) #endif #ifndef __HAVE_ARCH_PMD_SAME -#ifdef CONFIG_TRANSPARENT_HUGEPAGE static inline int pmd_same(pmd_t pmd_a, pmd_t pmd_b) { return pmd_val(pmd_a) == pmd_val(pmd_b); @@ -385,19 +384,6 @@ static inline int pud_same(pud_t pud_a, pud_t pud_b) { return pud_val(pud_a) == pud_val(pud_b); } -#else /* CONFIG_TRANSPARENT_HUGEPAGE */ -static inline int pmd_same(pmd_t pmd_a, pmd_t pmd_b) -{ - BUILD_BUG(); - return 0; -} - -static inline int pud_same(pud_t pud_a, pud_t pud_b) -{ - BUILD_BUG(); - return 0; -} -#endif /* CONFIG_TRANSPARENT_HUGEPAGE */ #endif #ifndef __HAVE_ARCH_DO_SWAP_PAGE
[tip:x86/mm] generic/pgtable: Make {pmd, pud}_same() unconditionally available
Commit-ID: c683c37cd13246941924c48f6c6a9863425e0cec Gitweb: https://git.kernel.org/tip/c683c37cd13246941924c48f6c6a9863425e0cec Author: Dan Williams AuthorDate: Tue, 4 Dec 2018 13:37:06 -0800 Committer: Ingo Molnar CommitDate: Wed, 5 Dec 2018 09:03:05 +0100 generic/pgtable: Make {pmd, pud}_same() unconditionally available In preparation for {pmd,pud}_same() to be used outside of transparent huge page code paths, make them unconditionally available. This enables them to be used in the definition of a new family of set_{pte,pmd,pud,p4d,pgd}_safe() helpers. Signed-off-by: Dan Williams Acked-by: Kirill A. Shutemov Cc: Andy Lutomirski Cc: Borislav Petkov Cc: Dave Hansen Cc: H. Peter Anvin Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Rik van Riel Cc: Sebastian Andrzej Siewior Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/154395942644.32119.10238934183949418128.st...@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Ingo Molnar --- include/asm-generic/pgtable.h | 14 -- 1 file changed, 14 deletions(-) diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h index 359fb935ded6..eea50ef8b8cd 100644 --- a/include/asm-generic/pgtable.h +++ b/include/asm-generic/pgtable.h @@ -375,7 +375,6 @@ static inline int pte_unused(pte_t pte) #endif #ifndef __HAVE_ARCH_PMD_SAME -#ifdef CONFIG_TRANSPARENT_HUGEPAGE static inline int pmd_same(pmd_t pmd_a, pmd_t pmd_b) { return pmd_val(pmd_a) == pmd_val(pmd_b); @@ -385,19 +384,6 @@ static inline int pud_same(pud_t pud_a, pud_t pud_b) { return pud_val(pud_a) == pud_val(pud_b); } -#else /* CONFIG_TRANSPARENT_HUGEPAGE */ -static inline int pmd_same(pmd_t pmd_a, pmd_t pmd_b) -{ - BUILD_BUG(); - return 0; -} - -static inline int pud_same(pud_t pud_a, pud_t pud_b) -{ - BUILD_BUG(); - return 0; -} -#endif /* CONFIG_TRANSPARENT_HUGEPAGE */ #endif #ifndef __HAVE_ARCH_DO_SWAP_PAGE
[tip:x86/asm] x86/vdso: Remove obsolete "fake section table" reservation
Commit-ID: 24b7c77bbb24f129beead93574ff248c3db1288e Gitweb: https://git.kernel.org/tip/24b7c77bbb24f129beead93574ff248c3db1288e Author: Sean Christopherson AuthorDate: Tue, 4 Dec 2018 13:25:57 -0800 Committer: Ingo Molnar CommitDate: Wed, 5 Dec 2018 08:58:11 +0100 x86/vdso: Remove obsolete "fake section table" reservation At one point the vDSO image was manually stripped down by vdso2c in an attempt to minimize the size of the image mapped into userspace. Part of that stripping process involved building a fake section table so as not to break userspace processes that parse the section table. Memory for the fake section table was reserved in the .rodata section so that vdso2c could simply copy the entire PT_LOAD segment into the userspace image after building the fake table. Eventually, the entire fake section table approach was dropped in favor of stripping the vdso "the old fashioned way", i.e. via objdump -S. But, the reservation in .rodata for the fake table was left behind. Remove the reserveration along with a few other related defines and section entries. Removing the fake section table placeholder zaps a whopping 0x340 bytes from the 64-bit vDSO image, which drops the current image's size to under 4k, i.e. reduces the effective size of the userspace vDSO mapping by a full page. Signed-off-by: Sean Christopherson Acked-by: Andy Lutomirski Cc: Borislav Petkov Cc: Brian Gerst Cc: Dave Hansen Cc: Denys Vlasenko Cc: H. Peter Anvin Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Rik van Riel Cc: Thomas Gleixner Fixes: da861e18eccc ("x86, vdso: Get rid of the fake section mechanism") Link: http://lkml.kernel.org/r/20181204212600.28090-2-sean.j.christopher...@intel.com Signed-off-by: Ingo Molnar --- arch/x86/entry/vdso/vdso-layout.lds.S | 22 -- arch/x86/entry/vdso/vdso2c.c | 8 2 files changed, 30 deletions(-) diff --git a/arch/x86/entry/vdso/vdso-layout.lds.S b/arch/x86/entry/vdso/vdso-layout.lds.S index acfd5ba7d943..0cedc905c8d6 100644 --- a/arch/x86/entry/vdso/vdso-layout.lds.S +++ b/arch/x86/entry/vdso/vdso-layout.lds.S @@ -7,16 +7,6 @@ * This script controls its layout. */ -#if defined(BUILD_VDSO64) -# define SHDR_SIZE 64 -#elif defined(BUILD_VDSO32) || defined(BUILD_VDSOX32) -# define SHDR_SIZE 40 -#else -# error unknown VDSO target -#endif - -#define NUM_FAKE_SHDRS 13 - SECTIONS { /* @@ -60,20 +50,8 @@ SECTIONS *(.bss*) *(.dynbss*) *(.gnu.linkonce.b.*) - - /* -* Ideally this would live in a C file, but that won't -* work cleanly for x32 until we start building the x32 -* C code using an x32 toolchain. -*/ - VDSO_FAKE_SECTION_TABLE_START = .; - . = . + NUM_FAKE_SHDRS * SHDR_SIZE; - VDSO_FAKE_SECTION_TABLE_END = .; } :text - .fake_shstrtab : { *(.fake_shstrtab) } :text - - .note : { *(.note.*) }:text :note .eh_frame_hdr : { *(.eh_frame_hdr) } :text :eh_frame_hdr diff --git a/arch/x86/entry/vdso/vdso2c.c b/arch/x86/entry/vdso/vdso2c.c index 4674f58581a1..8e470b018512 100644 --- a/arch/x86/entry/vdso/vdso2c.c +++ b/arch/x86/entry/vdso/vdso2c.c @@ -76,8 +76,6 @@ enum { sym_hpet_page, sym_pvclock_page, sym_hvclock_page, - sym_VDSO_FAKE_SECTION_TABLE_START, - sym_VDSO_FAKE_SECTION_TABLE_END, }; const int special_pages[] = { @@ -98,12 +96,6 @@ struct vdso_sym required_syms[] = { [sym_hpet_page] = {"hpet_page", true}, [sym_pvclock_page] = {"pvclock_page", true}, [sym_hvclock_page] = {"hvclock_page", true}, - [sym_VDSO_FAKE_SECTION_TABLE_START] = { - "VDSO_FAKE_SECTION_TABLE_START", false - }, - [sym_VDSO_FAKE_SECTION_TABLE_END] = { - "VDSO_FAKE_SECTION_TABLE_END", false - }, {"VDSO32_NOTE_MASK", true}, {"__kernel_vsyscall", true}, {"__kernel_sigreturn", true},
[tip:x86/asm] x86/vdso: Remove obsolete "fake section table" reservation
Commit-ID: 24b7c77bbb24f129beead93574ff248c3db1288e Gitweb: https://git.kernel.org/tip/24b7c77bbb24f129beead93574ff248c3db1288e Author: Sean Christopherson AuthorDate: Tue, 4 Dec 2018 13:25:57 -0800 Committer: Ingo Molnar CommitDate: Wed, 5 Dec 2018 08:58:11 +0100 x86/vdso: Remove obsolete "fake section table" reservation At one point the vDSO image was manually stripped down by vdso2c in an attempt to minimize the size of the image mapped into userspace. Part of that stripping process involved building a fake section table so as not to break userspace processes that parse the section table. Memory for the fake section table was reserved in the .rodata section so that vdso2c could simply copy the entire PT_LOAD segment into the userspace image after building the fake table. Eventually, the entire fake section table approach was dropped in favor of stripping the vdso "the old fashioned way", i.e. via objdump -S. But, the reservation in .rodata for the fake table was left behind. Remove the reserveration along with a few other related defines and section entries. Removing the fake section table placeholder zaps a whopping 0x340 bytes from the 64-bit vDSO image, which drops the current image's size to under 4k, i.e. reduces the effective size of the userspace vDSO mapping by a full page. Signed-off-by: Sean Christopherson Acked-by: Andy Lutomirski Cc: Borislav Petkov Cc: Brian Gerst Cc: Dave Hansen Cc: Denys Vlasenko Cc: H. Peter Anvin Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Rik van Riel Cc: Thomas Gleixner Fixes: da861e18eccc ("x86, vdso: Get rid of the fake section mechanism") Link: http://lkml.kernel.org/r/20181204212600.28090-2-sean.j.christopher...@intel.com Signed-off-by: Ingo Molnar --- arch/x86/entry/vdso/vdso-layout.lds.S | 22 -- arch/x86/entry/vdso/vdso2c.c | 8 2 files changed, 30 deletions(-) diff --git a/arch/x86/entry/vdso/vdso-layout.lds.S b/arch/x86/entry/vdso/vdso-layout.lds.S index acfd5ba7d943..0cedc905c8d6 100644 --- a/arch/x86/entry/vdso/vdso-layout.lds.S +++ b/arch/x86/entry/vdso/vdso-layout.lds.S @@ -7,16 +7,6 @@ * This script controls its layout. */ -#if defined(BUILD_VDSO64) -# define SHDR_SIZE 64 -#elif defined(BUILD_VDSO32) || defined(BUILD_VDSOX32) -# define SHDR_SIZE 40 -#else -# error unknown VDSO target -#endif - -#define NUM_FAKE_SHDRS 13 - SECTIONS { /* @@ -60,20 +50,8 @@ SECTIONS *(.bss*) *(.dynbss*) *(.gnu.linkonce.b.*) - - /* -* Ideally this would live in a C file, but that won't -* work cleanly for x32 until we start building the x32 -* C code using an x32 toolchain. -*/ - VDSO_FAKE_SECTION_TABLE_START = .; - . = . + NUM_FAKE_SHDRS * SHDR_SIZE; - VDSO_FAKE_SECTION_TABLE_END = .; } :text - .fake_shstrtab : { *(.fake_shstrtab) } :text - - .note : { *(.note.*) }:text :note .eh_frame_hdr : { *(.eh_frame_hdr) } :text :eh_frame_hdr diff --git a/arch/x86/entry/vdso/vdso2c.c b/arch/x86/entry/vdso/vdso2c.c index 4674f58581a1..8e470b018512 100644 --- a/arch/x86/entry/vdso/vdso2c.c +++ b/arch/x86/entry/vdso/vdso2c.c @@ -76,8 +76,6 @@ enum { sym_hpet_page, sym_pvclock_page, sym_hvclock_page, - sym_VDSO_FAKE_SECTION_TABLE_START, - sym_VDSO_FAKE_SECTION_TABLE_END, }; const int special_pages[] = { @@ -98,12 +96,6 @@ struct vdso_sym required_syms[] = { [sym_hpet_page] = {"hpet_page", true}, [sym_pvclock_page] = {"pvclock_page", true}, [sym_hvclock_page] = {"hvclock_page", true}, - [sym_VDSO_FAKE_SECTION_TABLE_START] = { - "VDSO_FAKE_SECTION_TABLE_START", false - }, - [sym_VDSO_FAKE_SECTION_TABLE_END] = { - "VDSO_FAKE_SECTION_TABLE_END", false - }, {"VDSO32_NOTE_MASK", true}, {"__kernel_vsyscall", true}, {"__kernel_sigreturn", true},
Re: [PATCH v1 5/5] perf cs-etm: Track exception number
On Tue, 4 Dec 2018 at 20:49, wrote: > > On Mon, Nov 19, 2018 at 01:47:49PM -0700, Mathieu Poirier wrote: > > On Sun, Nov 11, 2018 at 12:59:43PM +0800, Leo Yan wrote: > > > When an exception packet comes, it contains the info for exception > > > number; the exception number indicates the exception types, so from it > > > we can know if the exception is taken for interrupt, system call or > > > other traps, etc. But because the exception return packet cannot > > > delivery exception number correctly by decoder thus when prepare sample > > > flags we cannot know what's type for exception return. > > > > > > This patch adds a new 'exc_num' array in decoder structure to record > > > exception number per CPU, the exception number is recorded in the array > > > when the exception packet comes and this exception number can be used by > > > exception return packet. If detect there have discontinuous trace with > > > TRACE_ON or TRACE_OFF packet, the exception number is set to invalid > > > value. > > > > > > Signed-off-by: Leo Yan > > > --- > > > tools/perf/util/cs-etm-decoder/cs-etm-decoder.c | 67 > > > ++--- > > > 1 file changed, 59 insertions(+), 8 deletions(-) > > > > > > diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c > > > b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c > > > index b8cb7a3e..d1a6cbc 100644 > > > --- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c > > > +++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c > > > @@ -43,6 +43,7 @@ struct cs_etm_decoder { > > > u32 packet_count; > > > u32 head; > > > u32 tail; > > > + u32 *exc_num; > > > struct cs_etm_packet packet_buffer[MAX_BUFFER]; > > > }; > > > > > > @@ -368,24 +369,64 @@ static ocsd_datapath_resp_t > > > cs_etm_decoder__buffer_trace_off(struct cs_etm_decoder *decoder, > > > const uint8_t trace_chan_id) > > > { > > > - return cs_etm_decoder__buffer_packet(decoder, trace_chan_id, > > > -CS_ETM_TRACE_OFF); > > > + int ret; > > > + struct cs_etm_packet *packet; > > > + > > > + ret = cs_etm_decoder__buffer_packet(decoder, trace_chan_id, > > > + CS_ETM_TRACE_OFF); > > > + if (ret != OCSD_RESP_CONT && ret != OCSD_RESP_WAIT) > > > + return ret; > > > + > > > + packet = >packet_buffer[decoder->tail]; > > > + > > > + /* Clear execption number for discontinuous trace */ > > > + decoder->exc_num[packet->cpu] = UINT32_MAX; > > > + > > > + return ret; > > > } > > > > > > static ocsd_datapath_resp_t > > > cs_etm_decoder__buffer_trace_on(struct cs_etm_decoder *decoder, > > > const uint8_t trace_chan_id) > > > { > > > - return cs_etm_decoder__buffer_packet(decoder, trace_chan_id, > > > -CS_ETM_TRACE_ON); > > > + int ret; > > > + struct cs_etm_packet *packet; > > > + > > > + ret = cs_etm_decoder__buffer_packet(decoder, trace_chan_id, > > > + CS_ETM_TRACE_ON); > > > + if (ret != OCSD_RESP_CONT && ret != OCSD_RESP_WAIT) > > > + return ret; > > > + > > > + packet = >packet_buffer[decoder->tail]; > > > + > > > + /* Clear execption number for discontinuous trace */ > > > + decoder->exc_num[packet->cpu] = UINT32_MAX; > > > + > > > + return ret; > > > } > > > > > > static ocsd_datapath_resp_t > > > cs_etm_decoder__buffer_exception(struct cs_etm_decoder *decoder, > > > +const ocsd_generic_trace_elem *elem, > > > const uint8_t trace_chan_id) > > > { > > > - return cs_etm_decoder__buffer_packet(decoder, trace_chan_id, > > > -CS_ETM_EXCEPTION); > > > + int ret; > > > + struct cs_etm_packet *packet; > > > + > > > + ret = cs_etm_decoder__buffer_packet(decoder, trace_chan_id, > > > + CS_ETM_EXCEPTION); > > > + if (ret != OCSD_RESP_CONT && ret != OCSD_RESP_WAIT) > > > + return ret; > > > + > > > + packet = >packet_buffer[decoder->tail]; > > > + > > > + /* > > > +* Exception number is recorded per CPU and later can be used > > > +* for exception return instruction analysis. > > > +*/ > > > + decoder->exc_num[packet->cpu] = elem->exception_number; > > > > Am I missing something or the information about the exception number that is > > recorded here isn't used anywhere? > > The exception number will be used to set branch flag patch [1]. Right, I realised that when I started reviewing that set. The rule of thumb here is to introduce code in the same patchset it is used so that we avoid adding needless code to the kernel. > According to exception number we can know it's for system call, > interrupt or other traps. > > [1] > http://archive.armlinux.org.uk/lurker/message/2018.050755.d1c1b257.en.html > > > If you want to use this in perf report/script, > > the
Re: [PATCH v1 5/5] perf cs-etm: Track exception number
On Tue, 4 Dec 2018 at 20:49, wrote: > > On Mon, Nov 19, 2018 at 01:47:49PM -0700, Mathieu Poirier wrote: > > On Sun, Nov 11, 2018 at 12:59:43PM +0800, Leo Yan wrote: > > > When an exception packet comes, it contains the info for exception > > > number; the exception number indicates the exception types, so from it > > > we can know if the exception is taken for interrupt, system call or > > > other traps, etc. But because the exception return packet cannot > > > delivery exception number correctly by decoder thus when prepare sample > > > flags we cannot know what's type for exception return. > > > > > > This patch adds a new 'exc_num' array in decoder structure to record > > > exception number per CPU, the exception number is recorded in the array > > > when the exception packet comes and this exception number can be used by > > > exception return packet. If detect there have discontinuous trace with > > > TRACE_ON or TRACE_OFF packet, the exception number is set to invalid > > > value. > > > > > > Signed-off-by: Leo Yan > > > --- > > > tools/perf/util/cs-etm-decoder/cs-etm-decoder.c | 67 > > > ++--- > > > 1 file changed, 59 insertions(+), 8 deletions(-) > > > > > > diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c > > > b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c > > > index b8cb7a3e..d1a6cbc 100644 > > > --- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c > > > +++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c > > > @@ -43,6 +43,7 @@ struct cs_etm_decoder { > > > u32 packet_count; > > > u32 head; > > > u32 tail; > > > + u32 *exc_num; > > > struct cs_etm_packet packet_buffer[MAX_BUFFER]; > > > }; > > > > > > @@ -368,24 +369,64 @@ static ocsd_datapath_resp_t > > > cs_etm_decoder__buffer_trace_off(struct cs_etm_decoder *decoder, > > > const uint8_t trace_chan_id) > > > { > > > - return cs_etm_decoder__buffer_packet(decoder, trace_chan_id, > > > -CS_ETM_TRACE_OFF); > > > + int ret; > > > + struct cs_etm_packet *packet; > > > + > > > + ret = cs_etm_decoder__buffer_packet(decoder, trace_chan_id, > > > + CS_ETM_TRACE_OFF); > > > + if (ret != OCSD_RESP_CONT && ret != OCSD_RESP_WAIT) > > > + return ret; > > > + > > > + packet = >packet_buffer[decoder->tail]; > > > + > > > + /* Clear execption number for discontinuous trace */ > > > + decoder->exc_num[packet->cpu] = UINT32_MAX; > > > + > > > + return ret; > > > } > > > > > > static ocsd_datapath_resp_t > > > cs_etm_decoder__buffer_trace_on(struct cs_etm_decoder *decoder, > > > const uint8_t trace_chan_id) > > > { > > > - return cs_etm_decoder__buffer_packet(decoder, trace_chan_id, > > > -CS_ETM_TRACE_ON); > > > + int ret; > > > + struct cs_etm_packet *packet; > > > + > > > + ret = cs_etm_decoder__buffer_packet(decoder, trace_chan_id, > > > + CS_ETM_TRACE_ON); > > > + if (ret != OCSD_RESP_CONT && ret != OCSD_RESP_WAIT) > > > + return ret; > > > + > > > + packet = >packet_buffer[decoder->tail]; > > > + > > > + /* Clear execption number for discontinuous trace */ > > > + decoder->exc_num[packet->cpu] = UINT32_MAX; > > > + > > > + return ret; > > > } > > > > > > static ocsd_datapath_resp_t > > > cs_etm_decoder__buffer_exception(struct cs_etm_decoder *decoder, > > > +const ocsd_generic_trace_elem *elem, > > > const uint8_t trace_chan_id) > > > { > > > - return cs_etm_decoder__buffer_packet(decoder, trace_chan_id, > > > -CS_ETM_EXCEPTION); > > > + int ret; > > > + struct cs_etm_packet *packet; > > > + > > > + ret = cs_etm_decoder__buffer_packet(decoder, trace_chan_id, > > > + CS_ETM_EXCEPTION); > > > + if (ret != OCSD_RESP_CONT && ret != OCSD_RESP_WAIT) > > > + return ret; > > > + > > > + packet = >packet_buffer[decoder->tail]; > > > + > > > + /* > > > +* Exception number is recorded per CPU and later can be used > > > +* for exception return instruction analysis. > > > +*/ > > > + decoder->exc_num[packet->cpu] = elem->exception_number; > > > > Am I missing something or the information about the exception number that is > > recorded here isn't used anywhere? > > The exception number will be used to set branch flag patch [1]. Right, I realised that when I started reviewing that set. The rule of thumb here is to introduce code in the same patchset it is used so that we avoid adding needless code to the kernel. > According to exception number we can know it's for system call, > interrupt or other traps. > > [1] > http://archive.armlinux.org.uk/lurker/message/2018.050755.d1c1b257.en.html > > > If you want to use this in perf report/script, > > the
Re: [PATCH] x86/mce: Streamline MCE subsystem's naming
* Borislav Petkov wrote: > On Wed, Dec 05, 2018 at 05:30:37PM +0100, Ingo Molnar wrote: > > Would it make sense to organize it a bit more and separate out vendor > > specific functionality: > > > > mce/cpu/intel.c > > mce/cpu/intel-p5.c > > mce/cpu/amd.c > > mce/cpu/winchip.c > > That's too fine-grained IMO and look at the path we'd get then: > > arch/x86/kernel/cpu/mce/cpu/intel.c > ^^^ ^^^ Oh - I thought we'd have arch/x86/mce/ or so? There's machine check events that are not tied to any particular CPU, correct? So this would be the right conceptual level - and it would also remove the somewhat redundant 'kernel' part. Thanks, Ingo
Re: [PATCH] x86/mce: Streamline MCE subsystem's naming
* Borislav Petkov wrote: > On Wed, Dec 05, 2018 at 05:30:37PM +0100, Ingo Molnar wrote: > > Would it make sense to organize it a bit more and separate out vendor > > specific functionality: > > > > mce/cpu/intel.c > > mce/cpu/intel-p5.c > > mce/cpu/amd.c > > mce/cpu/winchip.c > > That's too fine-grained IMO and look at the path we'd get then: > > arch/x86/kernel/cpu/mce/cpu/intel.c > ^^^ ^^^ Oh - I thought we'd have arch/x86/mce/ or so? There's machine check events that are not tied to any particular CPU, correct? So this would be the right conceptual level - and it would also remove the somewhat redundant 'kernel' part. Thanks, Ingo
Re: [RFC PATCH 02/14] mm/hms: heterogenenous memory system (HMS) documentation
On Wed, Dec 05, 2018 at 10:25:31AM -0700, Logan Gunthorpe wrote: > > > On 2018-12-04 7:37 p.m., Jerome Glisse wrote: > >> > >> This came up before for apis even better defined than HMS as well as > >> more limited scope, i.e. experimental ABI availability only for -rc > >> kernels. Linus said this: > >> > >> "There are no loopholes. No "but it's been only one release". No, no, > >> no. The whole point is that users are supposed to be able to *trust* > >> the kernel. If we do something, we keep on doing it. > >> > >> And if it makes it harder to add new user-visible interfaces, then > >> that's a *good* thing." [1] > >> > >> The takeaway being don't land work-in-progress ABIs in the kernel. > >> Once an application depends on it, there are no more incompatible > >> changes possible regardless of the warnings, experimental notices, or > >> "staging" designation. DAX is experimental because there are cases > >> where it currently does not work with respect to another kernel > >> feature like xfs-reflink, RDMA. The plan is to fix those, not continue > >> to hide behind an experimental designation, and fix them in a way that > >> preserves the user visible behavior that has already been exposed, > >> i.e. no regressions. > >> > >> [1]: > >> https://lists.linuxfoundation.org/pipermail/ksummit-discuss/2017-August/004742.html > > > > So i guess i am heading down the vXX road ... such is my life :) > > I recommend against it. I really haven't been convinced by any of your > arguments for having a second topology tree. The existing topology tree > in sysfs already better describes the links between hardware right now, > except for the missing GPU links (and those should be addressable within > the GPU community). Plus, maybe, some other enhancements to sockets/numa > node descriptions if there's something missing there. > > Then, 'hbind' is another issue but I suspect it would be better > implemented as an ioctl on existing GPU interfaces. I certainly can't > see any benefit in using it myself. > > It's better to take an approach that would be less controversial with > the community than to brow beat them with a patch set 20+ times until > they take it. So here is what i am gonna do because i need this code now. I am gonna split the helper code that does policy and hbind out from its sysfs peerage and i am gonna turn it into helpers that each device driver can use. I will move the sysfs and syscall to be a patchset on its own which use the exact same above infrastructure. This means that i am loosing feature as it means that userspace can not provide a list of multiple device memory to use (which is much more common that you might think) but at least i can provide something for the single device case through ioctl. I am not giving up on sysfs or syscall as this is needed long term so i am gonna improve it, port existing userspace (OpenCL, ROCm, ...) to use it (in branch) and demonstrate how it get use by end application. I will beat it again and again until either i convince people through hard evidence or i get bored. I do not get bored easily :) Cheers, Jérôme
Re: [RFC PATCH 02/14] mm/hms: heterogenenous memory system (HMS) documentation
On Wed, Dec 05, 2018 at 10:25:31AM -0700, Logan Gunthorpe wrote: > > > On 2018-12-04 7:37 p.m., Jerome Glisse wrote: > >> > >> This came up before for apis even better defined than HMS as well as > >> more limited scope, i.e. experimental ABI availability only for -rc > >> kernels. Linus said this: > >> > >> "There are no loopholes. No "but it's been only one release". No, no, > >> no. The whole point is that users are supposed to be able to *trust* > >> the kernel. If we do something, we keep on doing it. > >> > >> And if it makes it harder to add new user-visible interfaces, then > >> that's a *good* thing." [1] > >> > >> The takeaway being don't land work-in-progress ABIs in the kernel. > >> Once an application depends on it, there are no more incompatible > >> changes possible regardless of the warnings, experimental notices, or > >> "staging" designation. DAX is experimental because there are cases > >> where it currently does not work with respect to another kernel > >> feature like xfs-reflink, RDMA. The plan is to fix those, not continue > >> to hide behind an experimental designation, and fix them in a way that > >> preserves the user visible behavior that has already been exposed, > >> i.e. no regressions. > >> > >> [1]: > >> https://lists.linuxfoundation.org/pipermail/ksummit-discuss/2017-August/004742.html > > > > So i guess i am heading down the vXX road ... such is my life :) > > I recommend against it. I really haven't been convinced by any of your > arguments for having a second topology tree. The existing topology tree > in sysfs already better describes the links between hardware right now, > except for the missing GPU links (and those should be addressable within > the GPU community). Plus, maybe, some other enhancements to sockets/numa > node descriptions if there's something missing there. > > Then, 'hbind' is another issue but I suspect it would be better > implemented as an ioctl on existing GPU interfaces. I certainly can't > see any benefit in using it myself. > > It's better to take an approach that would be less controversial with > the community than to brow beat them with a patch set 20+ times until > they take it. So here is what i am gonna do because i need this code now. I am gonna split the helper code that does policy and hbind out from its sysfs peerage and i am gonna turn it into helpers that each device driver can use. I will move the sysfs and syscall to be a patchset on its own which use the exact same above infrastructure. This means that i am loosing feature as it means that userspace can not provide a list of multiple device memory to use (which is much more common that you might think) but at least i can provide something for the single device case through ioctl. I am not giving up on sysfs or syscall as this is needed long term so i am gonna improve it, port existing userspace (OpenCL, ROCm, ...) to use it (in branch) and demonstrate how it get use by end application. I will beat it again and again until either i convince people through hard evidence or i get bored. I do not get bored easily :) Cheers, Jérôme
Re: [PATCH v6 0/8] AXP8x3 AC and battery power supply support
Hi, On Fri, Nov 30, 2018 at 04:31:28PM +0800, Chen-Yu Tsai wrote: > On Wed, Nov 21, 2018 at 1:52 AM Oskari Lemmela wrote: > > > > AXP813 AC power supply support with input current and > > voltage limiting support. > > > > AXP803 AC and battery power supply support. > > > > Changes in v6: > > * Collected tags > > * Rebase to master > > * Dropped AXP803 compatible patches > > > > Changes in v5: > > * Return correct input current limit for values 0x6 and 0x7 > > * Add specific compatibles for AXP803 > > * Change commit messages > > * Add Vasily Khoruzhick Pinebook DTS patch > > > > Changes in v4: > > * Change order of axp20x-gpio in axp20x.c > > * Fix indentation and spaces to tabs in axp20x.c > > > > Changes in v3: > > * Reorder ac_power_supply DT node > > * Rename axp20x_ac_power_set_property function > > * Split mfd commit > > > > Changes in v2: > > * Reuse axp813 compatibles for axp803 > > * Refactor axp20x_ac_power.c > > > > > > Oskari Lemmela (7): > > dt-bindings: power: supply: axp20x: add AXP813 AC power DT binding > > ARM: dts: axp81x: add AC power supply subnode > > arm64: dts: allwinner: axp803: add AC and battery power supplies > > arm64: dts: allwinner: a64: sopine-baseboard: enable power supplies > > power: supply: add AC power supply driver for AXP813 > > mfd: axp20x: Add AC power supply cell for AXP813 > > mfd: axp20x: Add supported cells for AXP803 > > > > Vasily Khoruzhick (1): > > arm64: dts: allwinner: a64: pinebook: enable power supplies > > Since this series has been fully reviewed, and the devices it > adds/enables aren't critical, i.e. won't cause the system to fail > if the drivers are missing, I've merged the dts patches for 4.21. > > Hopefully Sebastian and Lee will get around to merging the driver > patches. I queued the power-supply driver changes and prepared a signed immutable tag for Lee: The following changes since commit 651022382c7f8da46cb4872a545ee1da6d097d2a: Linux 4.20-rc1 (2018-11-04 15:37:52 -0800) are available in the Git repository at: ssh://g...@gitolite.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply.git tags/psy-mfd-axp813-immutable-for-v4.21-signed for you to fetch changes up to 7693b5643fd2d682de90733b67fc8032b9646911: power: supply: add AC power supply driver for AXP813 (2018-12-05 18:49:32 +0100) Immutable branch between mfd and power-supply for driver changes related to axp813. Signed-off-by: Sebastian Reichel Oskari Lemmela (2): dt-bindings: power: supply: axp20x: add AXP813 AC power DT binding power: supply: add AC power supply driver for AXP813 .../bindings/power/supply/axp20x_ac_power.txt | 3 + drivers/power/supply/axp20x_ac_power.c | 94 ++ include/linux/mfd/axp20x.h | 1 + 3 files changed, 98 insertions(+) signature.asc Description: PGP signature
Re: [PATCH v6 0/8] AXP8x3 AC and battery power supply support
Hi, On Fri, Nov 30, 2018 at 04:31:28PM +0800, Chen-Yu Tsai wrote: > On Wed, Nov 21, 2018 at 1:52 AM Oskari Lemmela wrote: > > > > AXP813 AC power supply support with input current and > > voltage limiting support. > > > > AXP803 AC and battery power supply support. > > > > Changes in v6: > > * Collected tags > > * Rebase to master > > * Dropped AXP803 compatible patches > > > > Changes in v5: > > * Return correct input current limit for values 0x6 and 0x7 > > * Add specific compatibles for AXP803 > > * Change commit messages > > * Add Vasily Khoruzhick Pinebook DTS patch > > > > Changes in v4: > > * Change order of axp20x-gpio in axp20x.c > > * Fix indentation and spaces to tabs in axp20x.c > > > > Changes in v3: > > * Reorder ac_power_supply DT node > > * Rename axp20x_ac_power_set_property function > > * Split mfd commit > > > > Changes in v2: > > * Reuse axp813 compatibles for axp803 > > * Refactor axp20x_ac_power.c > > > > > > Oskari Lemmela (7): > > dt-bindings: power: supply: axp20x: add AXP813 AC power DT binding > > ARM: dts: axp81x: add AC power supply subnode > > arm64: dts: allwinner: axp803: add AC and battery power supplies > > arm64: dts: allwinner: a64: sopine-baseboard: enable power supplies > > power: supply: add AC power supply driver for AXP813 > > mfd: axp20x: Add AC power supply cell for AXP813 > > mfd: axp20x: Add supported cells for AXP803 > > > > Vasily Khoruzhick (1): > > arm64: dts: allwinner: a64: pinebook: enable power supplies > > Since this series has been fully reviewed, and the devices it > adds/enables aren't critical, i.e. won't cause the system to fail > if the drivers are missing, I've merged the dts patches for 4.21. > > Hopefully Sebastian and Lee will get around to merging the driver > patches. I queued the power-supply driver changes and prepared a signed immutable tag for Lee: The following changes since commit 651022382c7f8da46cb4872a545ee1da6d097d2a: Linux 4.20-rc1 (2018-11-04 15:37:52 -0800) are available in the Git repository at: ssh://g...@gitolite.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply.git tags/psy-mfd-axp813-immutable-for-v4.21-signed for you to fetch changes up to 7693b5643fd2d682de90733b67fc8032b9646911: power: supply: add AC power supply driver for AXP813 (2018-12-05 18:49:32 +0100) Immutable branch between mfd and power-supply for driver changes related to axp813. Signed-off-by: Sebastian Reichel Oskari Lemmela (2): dt-bindings: power: supply: axp20x: add AXP813 AC power DT binding power: supply: add AC power supply driver for AXP813 .../bindings/power/supply/axp20x_ac_power.txt | 3 + drivers/power/supply/axp20x_ac_power.c | 94 ++ include/linux/mfd/axp20x.h | 1 + 3 files changed, 98 insertions(+) signature.asc Description: PGP signature
Re: edac driver injection of uncorrected errors & utils
On 12/5/18 8:38 AM, Tracy Smith wrote: > This is more directed toward York for layerscape. I see some edac code > that seem to do periodic scrubs based on intervals or scrub rate, but > that is not needed for the layerscape driver to correct errors because > errors are scrubbed when found by the edac scrub block or is it > because the memory controller itself does the correction/scrubbing > when an error is found? Single-bit errors are corrected by memory controller without involving software. York
Re: edac driver injection of uncorrected errors & utils
On 12/5/18 8:38 AM, Tracy Smith wrote: > This is more directed toward York for layerscape. I see some edac code > that seem to do periodic scrubs based on intervals or scrub rate, but > that is not needed for the layerscape driver to correct errors because > errors are scrubbed when found by the edac scrub block or is it > because the memory controller itself does the correction/scrubbing > when an error is found? Single-bit errors are corrected by memory controller without involving software. York
Re: [PATCH] auxdisplay: charlcd: fix x/y command parsing
Miguel Ojeda writes: > Hi Mans, > > [CC'ing a few people involved previously on this] > > On Wed, Dec 5, 2018 at 2:53 PM Mans Rullgard wrote: >> >> Commit b34050fadb86 ("auxdisplay: charlcd: Fix and clean up handling of >> x/y commands") fixed some problems by rewriting the parsing code, >> but also broke things further by removing the check for a complete >> command before attempting to parse it. As a result, parsing is >> terminated at the first x or y character. > > Why is it necessary to check for ";" to be exactly at the end? Or, in > other words, what requires it? > > If the syntax supported by parse_xy() is wrong for some reason, we > should fix that (instead of adding a check before calling it). Suppose the command "\e[Lx0y0;" is written to the device. The charlcd_write_char() function adds one character at a time to the escape sequence buffer. Once the 'L' has been seen, it calls handle_lcd_special_code() after each additional character is added. On encountering the 'x' this function will attempt to parse the command using parse_xy(), which fails since it is incomplete. It is nonetheless reported as processed, and the escape sequence is flushed. The remainder of the command (i.e. "0y0;") is then displayed as text. Since parse_xy() can never return success (true) unless there is a semicolon, it is pointless to call it until we have seen one. With the characters being added to the buffer one by one, it is enough to check for semicolon at the end. BTW, the parsing of this command has been broken since 3.2-rc1 due to commit 129957069e6a. -- Måns Rullgård
Re: [PATCH] auxdisplay: charlcd: fix x/y command parsing
Miguel Ojeda writes: > Hi Mans, > > [CC'ing a few people involved previously on this] > > On Wed, Dec 5, 2018 at 2:53 PM Mans Rullgard wrote: >> >> Commit b34050fadb86 ("auxdisplay: charlcd: Fix and clean up handling of >> x/y commands") fixed some problems by rewriting the parsing code, >> but also broke things further by removing the check for a complete >> command before attempting to parse it. As a result, parsing is >> terminated at the first x or y character. > > Why is it necessary to check for ";" to be exactly at the end? Or, in > other words, what requires it? > > If the syntax supported by parse_xy() is wrong for some reason, we > should fix that (instead of adding a check before calling it). Suppose the command "\e[Lx0y0;" is written to the device. The charlcd_write_char() function adds one character at a time to the escape sequence buffer. Once the 'L' has been seen, it calls handle_lcd_special_code() after each additional character is added. On encountering the 'x' this function will attempt to parse the command using parse_xy(), which fails since it is incomplete. It is nonetheless reported as processed, and the escape sequence is flushed. The remainder of the command (i.e. "0y0;") is then displayed as text. Since parse_xy() can never return success (true) unless there is a semicolon, it is pointless to call it until we have seen one. With the characters being added to the buffer one by one, it is enough to check for semicolon at the end. BTW, the parsing of this command has been broken since 3.2-rc1 due to commit 129957069e6a. -- Måns Rullgård
Re: [PATCH 1/8] perf: Allow to block process in syscall tracepoints
On Wed, Dec 05, 2018 at 12:35:03PM -0500, Steven Rostedt wrote: > On Wed, 5 Dec 2018 17:05:02 +0100 > Jiri Olsa wrote: > > > diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c > > index 3b2490b81918..e55cf9169a03 100644 > > --- a/arch/x86/entry/common.c > > +++ b/arch/x86/entry/common.c > > @@ -60,6 +60,32 @@ static void do_audit_syscall_entry(struct pt_regs *regs, > > u32 arch) > > } > > } > > > > +static void trace_block_syscall(struct pt_regs *regs, bool enter) > > +{ > > + current->perf_blocked = true; > > + > > + do { > > + schedule_timeout(100 * HZ); > > + current->perf_blocked_cnt = 0; > > + > > + if (enter) { > > + /* perf syscalls:* enter */ > > + perf_trace_syscall_enter(regs); > > + > > + /* perf raw_syscalls:* enter */ > > + perf_trace_sys_enter(_sys_enter, regs, > > regs->orig_ax); > > + } else { > > + /* perf syscalls:* enter */ > > + perf_trace_syscall_exit(regs); > > + > > + /* perf raw_syscalls:* enter */ > > + perf_trace_sys_exit(_sys_exit, regs, regs->ax); > > + } > > + } while (current->perf_blocked_cnt); > > I was thinking, if the process reading the perf buffer dies, how do we > tell this task to continue on? when the tracer process dies, its event will get uninstalled from the trace_event_call::perf_events list, so the next iteration of above loop won't get any blocked event and it will leave (assuming there's just single event) jirka
Re: [PATCH 1/8] perf: Allow to block process in syscall tracepoints
On Wed, Dec 05, 2018 at 12:35:03PM -0500, Steven Rostedt wrote: > On Wed, 5 Dec 2018 17:05:02 +0100 > Jiri Olsa wrote: > > > diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c > > index 3b2490b81918..e55cf9169a03 100644 > > --- a/arch/x86/entry/common.c > > +++ b/arch/x86/entry/common.c > > @@ -60,6 +60,32 @@ static void do_audit_syscall_entry(struct pt_regs *regs, > > u32 arch) > > } > > } > > > > +static void trace_block_syscall(struct pt_regs *regs, bool enter) > > +{ > > + current->perf_blocked = true; > > + > > + do { > > + schedule_timeout(100 * HZ); > > + current->perf_blocked_cnt = 0; > > + > > + if (enter) { > > + /* perf syscalls:* enter */ > > + perf_trace_syscall_enter(regs); > > + > > + /* perf raw_syscalls:* enter */ > > + perf_trace_sys_enter(_sys_enter, regs, > > regs->orig_ax); > > + } else { > > + /* perf syscalls:* enter */ > > + perf_trace_syscall_exit(regs); > > + > > + /* perf raw_syscalls:* enter */ > > + perf_trace_sys_exit(_sys_exit, regs, regs->ax); > > + } > > + } while (current->perf_blocked_cnt); > > I was thinking, if the process reading the perf buffer dies, how do we > tell this task to continue on? when the tracer process dies, its event will get uninstalled from the trace_event_call::perf_events list, so the next iteration of above loop won't get any blocked event and it will leave (assuming there's just single event) jirka
Re: [RFC PATCH 00/14] Heterogeneous Memory System (HMS) and hbind()
On Wed, Dec 05, 2018 at 09:27:09AM -0800, Dave Hansen wrote: > On 12/4/18 6:13 PM, Jerome Glisse wrote: > > On Tue, Dec 04, 2018 at 05:06:49PM -0800, Dave Hansen wrote: > >> OK, but there are 1024*1024 matrix cells on a systems with 1024 > >> proximity domains (ACPI term for NUMA node). So it sounds like you are > >> proposing a million-directory approach. > > > > No, pseudo code: > > struct list links; > > > > for (unsigned r = 0; r < nrows; r++) { > > for (unsigned c = 0; c < ncolumns; c++) { > > if (!link_find(links, hmat[r][c].bandwidth, > >hmat[r][c].latency)) { > > link = link_new(hmat[r][c].bandwidth, > > hmat[r][c].latency); > > // add initiator and target correspond to that row > > // and columns to this new link > > list_add(, links); > > } > > } > > } > > > > So all cells that have same property are under the same link. > > OK, so the "link" here is like a cable. It's like saying, "we have a > network and everything is connected with an ethernet cable that can do > 1gbit/sec". > > But, what actually connects an initiator to a target? I assume we still > need to know which link is used for each target/initiator pair. Where > is that enumerated? ls /sys/bus/hms/devices/v0-0-link/ node0 power subsystem uevent uid bandwidth latency v0-1-target v0-15-initiator v0-21-targetv0-4-initiator v0-7-initiator v0-10-initiator v0-13-initiator v0-16-initiator v0-2-initiator v0-11-initiator v0-14-initiator v0-17-initiator v0-3-initiator v0-5-initiator v0-8-initiator v0-6-initiator v0-9-initiator v0-12-initiator v0-10-initiator So above is 16 CPUs (initiators*) and 2 targets all connected through a common link. This means that all the initiators connected to this link can access all the target connected to this link. The bandwidth and latency is best case scenario for instance when only one initiator is accessing the target. Initiator can only access target they share a link with or an extended path through a bridge. So if you have an initiator connected to link0 and a target connected to link1 and there is a bridge link0 to link1 then the initiator can access the target memory in link1 but the bandwidth and latency will be min(link0.bandwidth, link1.bandwidth, bridge.bandwidth) min(link0.latency, link1.latency, bridge.latency) You can really match one to one a link with bus in your system. For instance with PCIE if you only have 16lanes PCIE devices you only devince one link directory for all your PCIE devices (ignore the PCIE peer to peer scenario here). You add a bride between your PCIE link to your NUMA node link (the node to which this PCIE root complex belongs), this means that PCIE device can access the local node memory with given bandwidth and latency (best case). > > I think this just means we need a million symlinks to a "link" instead > of a million link directories. Still not great. > > > Note that userspace can parse all this once during its initialization > > and create pools of target to use. > > It sounds like you're agreeing that there is too much data in this > interface for applications to _regularly_ parse it. We need some > central thing that parses it all and caches the results. No so there is 2 kinds of applications: 1) average one: i am using device {1, 3, 9} give me best memory for those devices 2) advance one: what is the topology of this system ? Parse the topology and partition its workload accordingly For case 1 you can pre-parse stuff but this can be done by helper library but for case 2 there is no amount of pre-parsing you can do in kernel, only the application knows its own architecture and thus only the application knows what matter in the topology. Is the application looking for big chunk of memory even if it is slow ? Is it also looking for fast memory close to X and Y ? ... Each application will care about different thing and there is no telling what its gonna be. So what i am saying is that this information is likely to be parse once by the application during startup ie the sysfs is not something that is continuously read and parse by the application (unless application also care about hotplug and then we are talking about the 1% of the 1%). Cheers, Jérôme
Re: [RFC PATCH 00/14] Heterogeneous Memory System (HMS) and hbind()
On Wed, Dec 05, 2018 at 09:27:09AM -0800, Dave Hansen wrote: > On 12/4/18 6:13 PM, Jerome Glisse wrote: > > On Tue, Dec 04, 2018 at 05:06:49PM -0800, Dave Hansen wrote: > >> OK, but there are 1024*1024 matrix cells on a systems with 1024 > >> proximity domains (ACPI term for NUMA node). So it sounds like you are > >> proposing a million-directory approach. > > > > No, pseudo code: > > struct list links; > > > > for (unsigned r = 0; r < nrows; r++) { > > for (unsigned c = 0; c < ncolumns; c++) { > > if (!link_find(links, hmat[r][c].bandwidth, > >hmat[r][c].latency)) { > > link = link_new(hmat[r][c].bandwidth, > > hmat[r][c].latency); > > // add initiator and target correspond to that row > > // and columns to this new link > > list_add(, links); > > } > > } > > } > > > > So all cells that have same property are under the same link. > > OK, so the "link" here is like a cable. It's like saying, "we have a > network and everything is connected with an ethernet cable that can do > 1gbit/sec". > > But, what actually connects an initiator to a target? I assume we still > need to know which link is used for each target/initiator pair. Where > is that enumerated? ls /sys/bus/hms/devices/v0-0-link/ node0 power subsystem uevent uid bandwidth latency v0-1-target v0-15-initiator v0-21-targetv0-4-initiator v0-7-initiator v0-10-initiator v0-13-initiator v0-16-initiator v0-2-initiator v0-11-initiator v0-14-initiator v0-17-initiator v0-3-initiator v0-5-initiator v0-8-initiator v0-6-initiator v0-9-initiator v0-12-initiator v0-10-initiator So above is 16 CPUs (initiators*) and 2 targets all connected through a common link. This means that all the initiators connected to this link can access all the target connected to this link. The bandwidth and latency is best case scenario for instance when only one initiator is accessing the target. Initiator can only access target they share a link with or an extended path through a bridge. So if you have an initiator connected to link0 and a target connected to link1 and there is a bridge link0 to link1 then the initiator can access the target memory in link1 but the bandwidth and latency will be min(link0.bandwidth, link1.bandwidth, bridge.bandwidth) min(link0.latency, link1.latency, bridge.latency) You can really match one to one a link with bus in your system. For instance with PCIE if you only have 16lanes PCIE devices you only devince one link directory for all your PCIE devices (ignore the PCIE peer to peer scenario here). You add a bride between your PCIE link to your NUMA node link (the node to which this PCIE root complex belongs), this means that PCIE device can access the local node memory with given bandwidth and latency (best case). > > I think this just means we need a million symlinks to a "link" instead > of a million link directories. Still not great. > > > Note that userspace can parse all this once during its initialization > > and create pools of target to use. > > It sounds like you're agreeing that there is too much data in this > interface for applications to _regularly_ parse it. We need some > central thing that parses it all and caches the results. No so there is 2 kinds of applications: 1) average one: i am using device {1, 3, 9} give me best memory for those devices 2) advance one: what is the topology of this system ? Parse the topology and partition its workload accordingly For case 1 you can pre-parse stuff but this can be done by helper library but for case 2 there is no amount of pre-parsing you can do in kernel, only the application knows its own architecture and thus only the application knows what matter in the topology. Is the application looking for big chunk of memory even if it is slow ? Is it also looking for fast memory close to X and Y ? ... Each application will care about different thing and there is no telling what its gonna be. So what i am saying is that this information is likely to be parse once by the application during startup ie the sysfs is not something that is continuously read and parse by the application (unless application also care about hotplug and then we are talking about the 1% of the 1%). Cheers, Jérôme
Re: [RFT PATCH v1 0/4] Unify CPU topology across ARM64 & RISC-V
On 11/29/2018 4:28 PM, Atish Patra wrote: The cpu-map DT entry in ARM64 can describe the CPU topology in much better way compared to other existing approaches. RISC-V can easily adopt this binding to represent it's own CPU topology. Thus, both cpu-map DT binding and topology parsing code can be moved to a common location so that RISC-V or any other architecture can leverage that. The relevant discussion regarding unifying cpu topology can be found in [1]. arch_topology seems to be a perfect place to move the common code. I have not introduced any functional changes in the moved code. The only downside in this approach is that the capacity code will be executed for RISC-V as well. But, it will exit immediately after not able to find the appropriate DT node. If the overhead is considered too much, we can always compile out capacity related functions under a different config for the architectures that do not support them. The patches have been tested for RISC-V and compile tested for ARM64 & x86. The socket change[2] is also now part of this series. [1] https://lkml.org/lkml/2018/11/6/19 [2] https://lkml.org/lkml/2018/11/7/918 QEMU changes for RISC-V topology are available at https://github.com/atishp04/riscv-qemu/tree/cpu_topo Apologies for the previous patch series with incorrect title and was sent only to kernel mailing list due to a bug in my config. Please ignore that. Atish Patra (3): dt-binding: cpu-topology: Move cpu-map to a common binding. cpu-topology: Move cpu topology code to common code. RISC-V: Parse cpu topology during boot. Sudeep Holla (1): Documentation: DT: arm: add support for sockets defining package boundaries .../{arm/topology.txt => cpu/cpu-topology.txt} | 133 +++-- arch/arm64/include/asm/topology.h | 22 -- arch/arm64/kernel/topology.c | 303 + arch/riscv/Kconfig | 1 + arch/riscv/kernel/smpboot.c| 3 + drivers/base/arch_topology.c | 294 include/linux/arch_topology.h | 26 ++ include/linux/topology.h | 1 + 8 files changed, 435 insertions(+), 348 deletions(-) rename Documentation/devicetree/bindings/{arm/topology.txt => cpu/cpu-topology.txt} (66%) -- 2.7.4 Seems to test fine on QDF2400. Tested-by: Jeffrey Hugo I did see that git am complained about patch #2 - patch:103: space before tab in indent. }; patch:114: space before tab in indent. }; warning: 2 lines add whitespace errors. -- Jeffrey Hugo Qualcomm Datacenter Technologies as an affiliate of Qualcomm Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.
Re: [RFT PATCH v1 0/4] Unify CPU topology across ARM64 & RISC-V
On 11/29/2018 4:28 PM, Atish Patra wrote: The cpu-map DT entry in ARM64 can describe the CPU topology in much better way compared to other existing approaches. RISC-V can easily adopt this binding to represent it's own CPU topology. Thus, both cpu-map DT binding and topology parsing code can be moved to a common location so that RISC-V or any other architecture can leverage that. The relevant discussion regarding unifying cpu topology can be found in [1]. arch_topology seems to be a perfect place to move the common code. I have not introduced any functional changes in the moved code. The only downside in this approach is that the capacity code will be executed for RISC-V as well. But, it will exit immediately after not able to find the appropriate DT node. If the overhead is considered too much, we can always compile out capacity related functions under a different config for the architectures that do not support them. The patches have been tested for RISC-V and compile tested for ARM64 & x86. The socket change[2] is also now part of this series. [1] https://lkml.org/lkml/2018/11/6/19 [2] https://lkml.org/lkml/2018/11/7/918 QEMU changes for RISC-V topology are available at https://github.com/atishp04/riscv-qemu/tree/cpu_topo Apologies for the previous patch series with incorrect title and was sent only to kernel mailing list due to a bug in my config. Please ignore that. Atish Patra (3): dt-binding: cpu-topology: Move cpu-map to a common binding. cpu-topology: Move cpu topology code to common code. RISC-V: Parse cpu topology during boot. Sudeep Holla (1): Documentation: DT: arm: add support for sockets defining package boundaries .../{arm/topology.txt => cpu/cpu-topology.txt} | 133 +++-- arch/arm64/include/asm/topology.h | 22 -- arch/arm64/kernel/topology.c | 303 + arch/riscv/Kconfig | 1 + arch/riscv/kernel/smpboot.c| 3 + drivers/base/arch_topology.c | 294 include/linux/arch_topology.h | 26 ++ include/linux/topology.h | 1 + 8 files changed, 435 insertions(+), 348 deletions(-) rename Documentation/devicetree/bindings/{arm/topology.txt => cpu/cpu-topology.txt} (66%) -- 2.7.4 Seems to test fine on QDF2400. Tested-by: Jeffrey Hugo I did see that git am complained about patch #2 - patch:103: space before tab in indent. }; patch:114: space before tab in indent. }; warning: 2 lines add whitespace errors. -- Jeffrey Hugo Qualcomm Datacenter Technologies as an affiliate of Qualcomm Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.
Re: [PATCH v2 2/2] soc: fsl: dpio: use a cpumask to identify which cpus are unused
On Tue, Dec 4, 2018 at 5:23 AM Ioana Ciornei wrote: > > The current implementation of the dpio driver uses a static next_cpu > variable to keep track of the index of the next cpu available. This > approach does not handle well unbinding and binding dpio devices in a > random order. For example, unbinding a dpio and then binding it again > with the driver, will generate the below error: > > $ echo dpio.5 > /sys/bus/fsl-mc/drivers/fsl_mc_dpio/unbind > $ echo dpio.5 > /sys/bus/fsl-mc/drivers/fsl_mc_dpio/bind > [ 103.946380] fsl_mc_dpio dpio.5: probe failed. Number of DPIOs exceeds > NR_CPUS. > [ 103.955157] fsl_mc_dpio dpio.5: fsl_mc_driver_probe failed: -34 > -bash: echo: write error: No such device > > Fix this error by keeping a global cpumask of unused cpus that will be > updated at every dpaa2_dpio_[probe,remove]. Applied. Thanks. > > Signed-off-by: Ioana Ciornei > --- > Changes in v2: > - added kernel-doc comment to the dpaa2_io_get_cpu function > > drivers/soc/fsl/dpio/dpio-driver.c | 25 - > drivers/soc/fsl/dpio/dpio-service.c | 13 + > include/soc/fsl/dpaa2-io.h | 2 ++ > 3 files changed, 31 insertions(+), 9 deletions(-) > > diff --git a/drivers/soc/fsl/dpio/dpio-driver.c > b/drivers/soc/fsl/dpio/dpio-driver.c > index e58fcc9..832175c 100644 > --- a/drivers/soc/fsl/dpio/dpio-driver.c > +++ b/drivers/soc/fsl/dpio/dpio-driver.c > @@ -30,6 +30,8 @@ struct dpio_priv { > struct dpaa2_io *io; > }; > > +static cpumask_var_t cpus_unused_mask; > + > static irqreturn_t dpio_irq_handler(int irq_num, void *arg) > { > struct device *dev = (struct device *)arg; > @@ -86,7 +88,7 @@ static int dpaa2_dpio_probe(struct fsl_mc_device *dpio_dev) > struct dpio_priv *priv; > int err = -ENOMEM; > struct device *dev = _dev->dev; > - static int next_cpu = -1; > + int possible_next_cpu; > > priv = devm_kzalloc(dev, sizeof(*priv), GFP_KERNEL); > if (!priv) > @@ -128,17 +130,14 @@ static int dpaa2_dpio_probe(struct fsl_mc_device > *dpio_dev) > desc.dpio_id = dpio_dev->obj_desc.id; > > /* get the cpu to use for the affinity hint */ > - if (next_cpu == -1) > - next_cpu = cpumask_first(cpu_online_mask); > - else > - next_cpu = cpumask_next(next_cpu, cpu_online_mask); > - > - if (!cpu_possible(next_cpu)) { > + possible_next_cpu = cpumask_first(cpus_unused_mask); > + if (possible_next_cpu >= nr_cpu_ids) { > dev_err(dev, "probe failed. Number of DPIOs exceeds > NR_CPUS.\n"); > err = -ERANGE; > goto err_allocate_irqs; > } > - desc.cpu = next_cpu; > + desc.cpu = possible_next_cpu; > + cpumask_clear_cpu(possible_next_cpu, cpus_unused_mask); > > /* > * Set the CENA regs to be the cache inhibited area of the portal to > @@ -211,7 +210,7 @@ static int dpaa2_dpio_remove(struct fsl_mc_device > *dpio_dev) > { > struct device *dev; > struct dpio_priv *priv; > - int err; > + int err = 0, cpu; > > dev = _dev->dev; > priv = dev_get_drvdata(dev); > @@ -220,6 +219,9 @@ static int dpaa2_dpio_remove(struct fsl_mc_device > *dpio_dev) > > dpio_teardown_irqs(dpio_dev); > > + cpu = dpaa2_io_get_cpu(priv->io); > + cpumask_set_cpu(cpu, cpus_unused_mask); > + > err = fsl_mc_portal_allocate(dpio_dev, 0, _dev->mc_io); > if (err) { > dev_err(dev, "MC portal allocation failed\n"); > @@ -267,11 +269,16 @@ static int dpaa2_dpio_remove(struct fsl_mc_device > *dpio_dev) > > static int dpio_driver_init(void) > { > + if (!zalloc_cpumask_var(_unused_mask, GFP_KERNEL)) > + return -ENOMEM; > + cpumask_copy(cpus_unused_mask, cpu_online_mask); > + > return fsl_mc_driver_register(_dpio_driver); > } > > static void dpio_driver_exit(void) > { > + free_cpumask_var(cpus_unused_mask); > fsl_mc_driver_unregister(_dpio_driver); > } > module_init(dpio_driver_init); > diff --git a/drivers/soc/fsl/dpio/dpio-service.c > b/drivers/soc/fsl/dpio/dpio-service.c > index 21c3e32..3b60258 100644 > --- a/drivers/soc/fsl/dpio/dpio-service.c > +++ b/drivers/soc/fsl/dpio/dpio-service.c > @@ -215,6 +215,19 @@ irqreturn_t dpaa2_io_irq(struct dpaa2_io *obj) > } > > /** > + * dpaa2_io_get_cpu() - get the cpu associated with a given DPIO object > + * > + * @d: the given DPIO object. > + * > + * Return the cpu associated with the DPIO object > + */ > +int dpaa2_io_get_cpu(struct dpaa2_io *d) > +{ > + return d->dpio_desc.cpu; > +} > +EXPORT_SYMBOL(dpaa2_io_get_cpu); > + > +/** > * dpaa2_io_service_register() - Prepare for servicing of FQDAN or CDAN > * notifications on the given DPIO service. > * @d: the given DPIO service. > diff --git a/include/soc/fsl/dpaa2-io.h b/include/soc/fsl/dpaa2-io.h > index
Re: [PATCH v2 2/2] soc: fsl: dpio: use a cpumask to identify which cpus are unused
On Tue, Dec 4, 2018 at 5:23 AM Ioana Ciornei wrote: > > The current implementation of the dpio driver uses a static next_cpu > variable to keep track of the index of the next cpu available. This > approach does not handle well unbinding and binding dpio devices in a > random order. For example, unbinding a dpio and then binding it again > with the driver, will generate the below error: > > $ echo dpio.5 > /sys/bus/fsl-mc/drivers/fsl_mc_dpio/unbind > $ echo dpio.5 > /sys/bus/fsl-mc/drivers/fsl_mc_dpio/bind > [ 103.946380] fsl_mc_dpio dpio.5: probe failed. Number of DPIOs exceeds > NR_CPUS. > [ 103.955157] fsl_mc_dpio dpio.5: fsl_mc_driver_probe failed: -34 > -bash: echo: write error: No such device > > Fix this error by keeping a global cpumask of unused cpus that will be > updated at every dpaa2_dpio_[probe,remove]. Applied. Thanks. > > Signed-off-by: Ioana Ciornei > --- > Changes in v2: > - added kernel-doc comment to the dpaa2_io_get_cpu function > > drivers/soc/fsl/dpio/dpio-driver.c | 25 - > drivers/soc/fsl/dpio/dpio-service.c | 13 + > include/soc/fsl/dpaa2-io.h | 2 ++ > 3 files changed, 31 insertions(+), 9 deletions(-) > > diff --git a/drivers/soc/fsl/dpio/dpio-driver.c > b/drivers/soc/fsl/dpio/dpio-driver.c > index e58fcc9..832175c 100644 > --- a/drivers/soc/fsl/dpio/dpio-driver.c > +++ b/drivers/soc/fsl/dpio/dpio-driver.c > @@ -30,6 +30,8 @@ struct dpio_priv { > struct dpaa2_io *io; > }; > > +static cpumask_var_t cpus_unused_mask; > + > static irqreturn_t dpio_irq_handler(int irq_num, void *arg) > { > struct device *dev = (struct device *)arg; > @@ -86,7 +88,7 @@ static int dpaa2_dpio_probe(struct fsl_mc_device *dpio_dev) > struct dpio_priv *priv; > int err = -ENOMEM; > struct device *dev = _dev->dev; > - static int next_cpu = -1; > + int possible_next_cpu; > > priv = devm_kzalloc(dev, sizeof(*priv), GFP_KERNEL); > if (!priv) > @@ -128,17 +130,14 @@ static int dpaa2_dpio_probe(struct fsl_mc_device > *dpio_dev) > desc.dpio_id = dpio_dev->obj_desc.id; > > /* get the cpu to use for the affinity hint */ > - if (next_cpu == -1) > - next_cpu = cpumask_first(cpu_online_mask); > - else > - next_cpu = cpumask_next(next_cpu, cpu_online_mask); > - > - if (!cpu_possible(next_cpu)) { > + possible_next_cpu = cpumask_first(cpus_unused_mask); > + if (possible_next_cpu >= nr_cpu_ids) { > dev_err(dev, "probe failed. Number of DPIOs exceeds > NR_CPUS.\n"); > err = -ERANGE; > goto err_allocate_irqs; > } > - desc.cpu = next_cpu; > + desc.cpu = possible_next_cpu; > + cpumask_clear_cpu(possible_next_cpu, cpus_unused_mask); > > /* > * Set the CENA regs to be the cache inhibited area of the portal to > @@ -211,7 +210,7 @@ static int dpaa2_dpio_remove(struct fsl_mc_device > *dpio_dev) > { > struct device *dev; > struct dpio_priv *priv; > - int err; > + int err = 0, cpu; > > dev = _dev->dev; > priv = dev_get_drvdata(dev); > @@ -220,6 +219,9 @@ static int dpaa2_dpio_remove(struct fsl_mc_device > *dpio_dev) > > dpio_teardown_irqs(dpio_dev); > > + cpu = dpaa2_io_get_cpu(priv->io); > + cpumask_set_cpu(cpu, cpus_unused_mask); > + > err = fsl_mc_portal_allocate(dpio_dev, 0, _dev->mc_io); > if (err) { > dev_err(dev, "MC portal allocation failed\n"); > @@ -267,11 +269,16 @@ static int dpaa2_dpio_remove(struct fsl_mc_device > *dpio_dev) > > static int dpio_driver_init(void) > { > + if (!zalloc_cpumask_var(_unused_mask, GFP_KERNEL)) > + return -ENOMEM; > + cpumask_copy(cpus_unused_mask, cpu_online_mask); > + > return fsl_mc_driver_register(_dpio_driver); > } > > static void dpio_driver_exit(void) > { > + free_cpumask_var(cpus_unused_mask); > fsl_mc_driver_unregister(_dpio_driver); > } > module_init(dpio_driver_init); > diff --git a/drivers/soc/fsl/dpio/dpio-service.c > b/drivers/soc/fsl/dpio/dpio-service.c > index 21c3e32..3b60258 100644 > --- a/drivers/soc/fsl/dpio/dpio-service.c > +++ b/drivers/soc/fsl/dpio/dpio-service.c > @@ -215,6 +215,19 @@ irqreturn_t dpaa2_io_irq(struct dpaa2_io *obj) > } > > /** > + * dpaa2_io_get_cpu() - get the cpu associated with a given DPIO object > + * > + * @d: the given DPIO object. > + * > + * Return the cpu associated with the DPIO object > + */ > +int dpaa2_io_get_cpu(struct dpaa2_io *d) > +{ > + return d->dpio_desc.cpu; > +} > +EXPORT_SYMBOL(dpaa2_io_get_cpu); > + > +/** > * dpaa2_io_service_register() - Prepare for servicing of FQDAN or CDAN > * notifications on the given DPIO service. > * @d: the given DPIO service. > diff --git a/include/soc/fsl/dpaa2-io.h b/include/soc/fsl/dpaa2-io.h > index
Re: [PATCH v2 1/2] soc: fsl: dpio: cleanup the cpu array on dpaa2_io_down
On Tue, Dec 4, 2018 at 5:21 AM Ioana Ciornei wrote: > > The dpio_by_cpu array should not contain a reference to a freed dpaa2_io > object. This patch adds the necessary cleanup in dpaa2_io_down. Applied. Thanks. > > Signed-off-by: Ioana Ciornei > --- > Changes in v2: > - none > > drivers/soc/fsl/dpio/dpio-service.c | 5 + > 1 file changed, 5 insertions(+) > > diff --git a/drivers/soc/fsl/dpio/dpio-service.c > b/drivers/soc/fsl/dpio/dpio-service.c > index 9b17f72..21c3e32 100644 > --- a/drivers/soc/fsl/dpio/dpio-service.c > +++ b/drivers/soc/fsl/dpio/dpio-service.c > @@ -160,6 +160,11 @@ struct dpaa2_io *dpaa2_io_create(const struct > dpaa2_io_desc *desc) > */ > void dpaa2_io_down(struct dpaa2_io *d) > { > + spin_lock(_list_lock); > + dpio_by_cpu[d->dpio_desc.cpu] = NULL; > + list_del(>node); > + spin_unlock(_list_lock); > + > kfree(d); > } > > -- > 1.9.1 >
Re: [PATCH v2 1/2] soc: fsl: dpio: cleanup the cpu array on dpaa2_io_down
On Tue, Dec 4, 2018 at 5:21 AM Ioana Ciornei wrote: > > The dpio_by_cpu array should not contain a reference to a freed dpaa2_io > object. This patch adds the necessary cleanup in dpaa2_io_down. Applied. Thanks. > > Signed-off-by: Ioana Ciornei > --- > Changes in v2: > - none > > drivers/soc/fsl/dpio/dpio-service.c | 5 + > 1 file changed, 5 insertions(+) > > diff --git a/drivers/soc/fsl/dpio/dpio-service.c > b/drivers/soc/fsl/dpio/dpio-service.c > index 9b17f72..21c3e32 100644 > --- a/drivers/soc/fsl/dpio/dpio-service.c > +++ b/drivers/soc/fsl/dpio/dpio-service.c > @@ -160,6 +160,11 @@ struct dpaa2_io *dpaa2_io_create(const struct > dpaa2_io_desc *desc) > */ > void dpaa2_io_down(struct dpaa2_io *d) > { > + spin_lock(_list_lock); > + dpio_by_cpu[d->dpio_desc.cpu] = NULL; > + list_del(>node); > + spin_unlock(_list_lock); > + > kfree(d); > } > > -- > 1.9.1 >
Re: [PATCH 2/2] ASoC: DA7219: Implement error check on reg read and write
On Wed, Dec 5, 2018 at 4:28 AM Mark Brown wrote: > > On Wed, Dec 05, 2018 at 10:21:04AM +, Adam Thomson wrote: > > > If the previous I2C access failed, how can we be sure that the write back > > to HW > > of 0xFF even succeeds? More importantly these error returns won't > > necessarily > > stop subsequent calls to controls within the Codec I believe, so you could > > still > > see unwanted writes to HW via I2C, if I2C is sporadically operational. > > Again I > > don't see this update resolving that. The key thing is to resolve why even > > just > > one I2C transaction fails. > > Right, it's just not clear what we can constructively do if the I2C bus > falls to bits other than log things and the I2C controllers will > generally do that themselves. There's no guarantee what made it > through to the device or what will in future make it through to the > device. I agree, there is no guarantee here once things have gone wrong, and the concerns above are reasonable. However, in the real world, I2C transactions do sometimes fail for various reasons. The I2C (and other bus) APIs have ways of reporting up their errors so callers can take appropriate action. This codec driver can run on all kinds of hardware that can experience transient I2C errors, thus it sounds like a reasonable idea to have the driver do some error checking on the APIs it calls and take whatever action it can. Just ignoring the errors and proceeding like nothing is wrong is one option, but we can probably do a little better by at least checking for errors, abort the current operation, and pass up errors to higher layers when an i2c transaction failure is detected. If nothing else, this would enable higher policy layers to take appropriate corrective action - for example, if there is an i2c error when configuring a codec, it seems advisable to report this up so that a machine driver would know to abort and not turn on downstream amplifiers [I am assuming here that something like this would happen, I don't know if the sound stack really works this way]. Once the default "check, abort and report" error checking is in place, we could perhaps think about actions that the driver could take to recover from various failures, such as resetting the device or unwinding previous transactions before aborting, or retrying but those are all policy, and this patch is more mechanism that enables policy. As for this patch itself, I would recommend using snd_soc_component_read rather than snd_soc_component_read32() which is fundamentally broken and afaict should be removed, since there is no way to distinguish between its error return "(u32)-1" and the valid register value 0x. (Edit: I notice that you did this in v2, so please ignore, but still replying here since this seems to be where the active discussion is). -Dan
Re: [PATCH 2/9] ACPI/IORT: Use helper functions to access dev->iommu_fwspec
On 04/12/2018 16:29, Joerg Roedel wrote: From: Joerg Roedel Use the new helpers dev_iommu_fwspec_get()/set() to access the dev->iommu_fwspec pointer. This makes it easier to move that pointer later into another struct. Cc: Lorenzo Pieralisi Signed-off-by: Joerg Roedel --- drivers/acpi/arm64/iort.c | 12 +++- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c index 70f4e80b9246..754a67ba49e5 100644 --- a/drivers/acpi/arm64/iort.c +++ b/drivers/acpi/arm64/iort.c @@ -779,7 +779,7 @@ static inline bool iort_iommu_driver_enabled(u8 type) static struct acpi_iort_node *iort_get_msi_resv_iommu(struct device *dev) { struct acpi_iort_node *iommu; - struct iommu_fwspec *fwspec = dev->iommu_fwspec; + struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev); iommu = iort_get_iort_node(fwspec->iommu_fwnode); @@ -824,6 +824,7 @@ static inline int iort_add_device_replay(const struct iommu_ops *ops, */ int iort_iommu_msi_get_resv_regions(struct device *dev, struct list_head *head) { + struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev); struct acpi_iort_its_group *its; struct acpi_iort_node *iommu_node, *its_node = NULL; int i, resv = 0; @@ -841,9 +842,9 @@ int iort_iommu_msi_get_resv_regions(struct device *dev, struct list_head *head) * a given PCI or named component may map IDs to. */ - for (i = 0; i < dev->iommu_fwspec->num_ids; i++) { + for (i = 0; i < fwspec->num_ids; i++) { its_node = iort_node_map_id(iommu_node, - dev->iommu_fwspec->ids[i], + fwspec->ids[i], NULL, IORT_MSI_TYPE); if (its_node) break; @@ -1036,6 +1037,7 @@ void iort_dma_setup(struct device *dev, u64 *dma_addr, u64 *dma_size) */ const struct iommu_ops *iort_iommu_configure(struct device *dev) { + struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev); struct acpi_iort_node *node, *parent; const struct iommu_ops *ops; u32 streamid = 0; @@ -1045,7 +1047,7 @@ const struct iommu_ops *iort_iommu_configure(struct device *dev) * If we already translated the fwspec there * is nothing left to do, return the iommu_ops. */ - ops = iort_fwspec_iommu_ops(dev->iommu_fwspec); + ops = iort_fwspec_iommu_ops(fwspec); if (ops) return ops; @@ -1084,7 +1086,7 @@ const struct iommu_ops *iort_iommu_configure(struct device *dev) * add_device callback for dev, replay it to get things in order. */ if (!err) { - ops = iort_fwspec_iommu_ops(dev->iommu_fwspec); + ops = iort_fwspec_iommu_ops(fwspec); This needs to reload the new fwspec initialised by iort_iommu_xlate(), in the same manner as the OF code. I think the best thing to do is encapsulate the dev_iommu_fwspec_get() call in iort_fwspec_iommu_ops(), and have that take dev as its argument directly. Robin. err = iort_add_device_replay(ops, dev); }
Re: [PATCH 2/2] ASoC: DA7219: Implement error check on reg read and write
On Wed, Dec 5, 2018 at 4:28 AM Mark Brown wrote: > > On Wed, Dec 05, 2018 at 10:21:04AM +, Adam Thomson wrote: > > > If the previous I2C access failed, how can we be sure that the write back > > to HW > > of 0xFF even succeeds? More importantly these error returns won't > > necessarily > > stop subsequent calls to controls within the Codec I believe, so you could > > still > > see unwanted writes to HW via I2C, if I2C is sporadically operational. > > Again I > > don't see this update resolving that. The key thing is to resolve why even > > just > > one I2C transaction fails. > > Right, it's just not clear what we can constructively do if the I2C bus > falls to bits other than log things and the I2C controllers will > generally do that themselves. There's no guarantee what made it > through to the device or what will in future make it through to the > device. I agree, there is no guarantee here once things have gone wrong, and the concerns above are reasonable. However, in the real world, I2C transactions do sometimes fail for various reasons. The I2C (and other bus) APIs have ways of reporting up their errors so callers can take appropriate action. This codec driver can run on all kinds of hardware that can experience transient I2C errors, thus it sounds like a reasonable idea to have the driver do some error checking on the APIs it calls and take whatever action it can. Just ignoring the errors and proceeding like nothing is wrong is one option, but we can probably do a little better by at least checking for errors, abort the current operation, and pass up errors to higher layers when an i2c transaction failure is detected. If nothing else, this would enable higher policy layers to take appropriate corrective action - for example, if there is an i2c error when configuring a codec, it seems advisable to report this up so that a machine driver would know to abort and not turn on downstream amplifiers [I am assuming here that something like this would happen, I don't know if the sound stack really works this way]. Once the default "check, abort and report" error checking is in place, we could perhaps think about actions that the driver could take to recover from various failures, such as resetting the device or unwinding previous transactions before aborting, or retrying but those are all policy, and this patch is more mechanism that enables policy. As for this patch itself, I would recommend using snd_soc_component_read rather than snd_soc_component_read32() which is fundamentally broken and afaict should be removed, since there is no way to distinguish between its error return "(u32)-1" and the valid register value 0x. (Edit: I notice that you did this in v2, so please ignore, but still replying here since this seems to be where the active discussion is). -Dan
Re: [PATCH 2/9] ACPI/IORT: Use helper functions to access dev->iommu_fwspec
On 04/12/2018 16:29, Joerg Roedel wrote: From: Joerg Roedel Use the new helpers dev_iommu_fwspec_get()/set() to access the dev->iommu_fwspec pointer. This makes it easier to move that pointer later into another struct. Cc: Lorenzo Pieralisi Signed-off-by: Joerg Roedel --- drivers/acpi/arm64/iort.c | 12 +++- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c index 70f4e80b9246..754a67ba49e5 100644 --- a/drivers/acpi/arm64/iort.c +++ b/drivers/acpi/arm64/iort.c @@ -779,7 +779,7 @@ static inline bool iort_iommu_driver_enabled(u8 type) static struct acpi_iort_node *iort_get_msi_resv_iommu(struct device *dev) { struct acpi_iort_node *iommu; - struct iommu_fwspec *fwspec = dev->iommu_fwspec; + struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev); iommu = iort_get_iort_node(fwspec->iommu_fwnode); @@ -824,6 +824,7 @@ static inline int iort_add_device_replay(const struct iommu_ops *ops, */ int iort_iommu_msi_get_resv_regions(struct device *dev, struct list_head *head) { + struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev); struct acpi_iort_its_group *its; struct acpi_iort_node *iommu_node, *its_node = NULL; int i, resv = 0; @@ -841,9 +842,9 @@ int iort_iommu_msi_get_resv_regions(struct device *dev, struct list_head *head) * a given PCI or named component may map IDs to. */ - for (i = 0; i < dev->iommu_fwspec->num_ids; i++) { + for (i = 0; i < fwspec->num_ids; i++) { its_node = iort_node_map_id(iommu_node, - dev->iommu_fwspec->ids[i], + fwspec->ids[i], NULL, IORT_MSI_TYPE); if (its_node) break; @@ -1036,6 +1037,7 @@ void iort_dma_setup(struct device *dev, u64 *dma_addr, u64 *dma_size) */ const struct iommu_ops *iort_iommu_configure(struct device *dev) { + struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev); struct acpi_iort_node *node, *parent; const struct iommu_ops *ops; u32 streamid = 0; @@ -1045,7 +1047,7 @@ const struct iommu_ops *iort_iommu_configure(struct device *dev) * If we already translated the fwspec there * is nothing left to do, return the iommu_ops. */ - ops = iort_fwspec_iommu_ops(dev->iommu_fwspec); + ops = iort_fwspec_iommu_ops(fwspec); if (ops) return ops; @@ -1084,7 +1086,7 @@ const struct iommu_ops *iort_iommu_configure(struct device *dev) * add_device callback for dev, replay it to get things in order. */ if (!err) { - ops = iort_fwspec_iommu_ops(dev->iommu_fwspec); + ops = iort_fwspec_iommu_ops(fwspec); This needs to reload the new fwspec initialised by iort_iommu_xlate(), in the same manner as the OF code. I think the best thing to do is encapsulate the dev_iommu_fwspec_get() call in iort_fwspec_iommu_ops(), and have that take dev as its argument directly. Robin. err = iort_add_device_replay(ops, dev); }
Re: [PATCH v1 3/5] perf cs-etm: Support for NO_SYNC packet
On Tue, 4 Dec 2018 at 20:31, wrote: > > On Mon, Nov 19, 2018 at 11:27:59AM -0700, Mathieu Poirier wrote: > > On Sun, Nov 11, 2018 at 12:59:41PM +0800, Leo Yan wrote: > > > As described in OpenCSD (CoreSight decoder lib), in the decoding stream > > > it includes one trace element with type OCSD_GEN_TRC_ELEM_NO_SYNC; the > > > element indicates 'either at start of decode, or after overflow / bad > > > packet', we should take it as a signal for the tracing off and this will > > > cause tracing discontinuity. From the trace dump with 'perf script', > > > sometimes the element OCSD_GEN_TRC_ELEM_NO_SYNC collaborates with > > > element OCSD_GEN_TRC_ELEM_TRACE_ON to show the tracing flow have been > > > turned off and on, in this case the cs-etm code has handled TRACE_ON > > > packet well so we observe the tracing discontinuity; but in another case > > > it only inserts the element OCSD_GEN_TRC_ELEM_NO_SYNC into instructions > > > packets, we miss to handle the case if has only standalone NO_SYNC > > > element and users cannot receive the info for tracing discontinuity. > > > > > > This patch introduces new type CS_ETM_TRACE_OFF to generate packet for > > > receiving element OCSD_GEN_TRC_ELEM_NO_SYNC from decoder; when generate > > > sample, CS_ETM_TRACE_OFF packet has almost the same behaviour with > > > > Here you have used the word "almost", leading me to beleive there are cases > > where the handling of TRACE_ON and NO_SYNC packets differ, but the > > implementation enacts the same behavior for both. > > > > Mike or Robert, can you confirm that TRACE_ON and NO_SYNC packets should be > > treated the same way? > > I also would like to get suggestions/comments from Mike and Rob. > > For NO_SYNC packet, Mike before has given some explination for it: > > "looking at the decoder flow, when the decoder is reset, then it is > returned to an unsynchronised state and a NO_SYNC will be output. > With perf, it can concatenate multiple trace buffers into a single > section of the perf.data file. To ensure that the decode process can > see the start of a new buffer, the drivers will insert a barrier > packet between different buffers so that the decoder can spot the > boundaries. When the decoder hits a barrier packet it will > automatically reset so that it correctly decodes the next trace > buffer. This could be what you are seeing here." > > So I think there still have some difference between TRACE_ON and > NO_SYNC packets, TRACE_ON packet indicates the start of trace and it's > also possible caused by tracing discontinuity; NO_SYNC packets usually > caused by an unsynchronised state. But both of them presents > discontinuity in perf trace data. > > I prefer to use two different packet types to present them, this can > let the code to be more readable and easier to be maintained later. Absolutely. Make sure to get to the bottom of the story with Mike and/or Robert before sending your next patchset. > > If you agree with this, I also will rephrase the commit log to avoid > confusion that these two packets are the same thing. Yes please. > > > Leo, if handling of the TRACE_ON and NO_SYNC packets is the same then we > > should > > treat them the same way in cs_etm_decoder__gen_trace_elem_printer(), i.e > > simply > > call cs_etm_decoder__buffer_trace_on(). That way packet handling in > > cs-etm.c > > doesn't change. Otherwise see my comments below. > > > > > CS_ETM_TRACE_ON packet: both of them invokes cs_etm__flush() to generate > > > samples for the previous instructions packet, and in cs_etm__sample() it > > > also needs to generate samples if TRACE_OFF packet is followed by one > > > sequential instructions packet. This patch also converts the address to > > > 0 for TRACE_OFF packet, this is same with TRACE_ON packet as well. > > > > > > Signed-off-by: Leo Yan > > > --- > > > tools/perf/util/cs-etm-decoder/cs-etm-decoder.c | 10 ++ > > > tools/perf/util/cs-etm-decoder/cs-etm-decoder.h | 7 --- > > > tools/perf/util/cs-etm.c| 15 +++ > > > 3 files changed, 25 insertions(+), 7 deletions(-) > > > > > > diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c > > > b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c > > > index 5efb616..9d52727 100644 > > > --- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c > > > +++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c > > > @@ -369,6 +369,14 @@ cs_etm_decoder__buffer_range(struct cs_etm_decoder > > > *decoder, > > > } > > > > > > static ocsd_datapath_resp_t > > > +cs_etm_decoder__buffer_trace_off(struct cs_etm_decoder *decoder, > > > +const uint8_t trace_chan_id) > > > +{ > > > + return cs_etm_decoder__buffer_packet(decoder, trace_chan_id, > > > +CS_ETM_TRACE_OFF); > > > +} > > > + > > > +static ocsd_datapath_resp_t > > > cs_etm_decoder__buffer_trace_on(struct cs_etm_decoder *decoder, > > > const uint8_t
Re: [PATCH v1 3/5] perf cs-etm: Support for NO_SYNC packet
On Tue, 4 Dec 2018 at 20:31, wrote: > > On Mon, Nov 19, 2018 at 11:27:59AM -0700, Mathieu Poirier wrote: > > On Sun, Nov 11, 2018 at 12:59:41PM +0800, Leo Yan wrote: > > > As described in OpenCSD (CoreSight decoder lib), in the decoding stream > > > it includes one trace element with type OCSD_GEN_TRC_ELEM_NO_SYNC; the > > > element indicates 'either at start of decode, or after overflow / bad > > > packet', we should take it as a signal for the tracing off and this will > > > cause tracing discontinuity. From the trace dump with 'perf script', > > > sometimes the element OCSD_GEN_TRC_ELEM_NO_SYNC collaborates with > > > element OCSD_GEN_TRC_ELEM_TRACE_ON to show the tracing flow have been > > > turned off and on, in this case the cs-etm code has handled TRACE_ON > > > packet well so we observe the tracing discontinuity; but in another case > > > it only inserts the element OCSD_GEN_TRC_ELEM_NO_SYNC into instructions > > > packets, we miss to handle the case if has only standalone NO_SYNC > > > element and users cannot receive the info for tracing discontinuity. > > > > > > This patch introduces new type CS_ETM_TRACE_OFF to generate packet for > > > receiving element OCSD_GEN_TRC_ELEM_NO_SYNC from decoder; when generate > > > sample, CS_ETM_TRACE_OFF packet has almost the same behaviour with > > > > Here you have used the word "almost", leading me to beleive there are cases > > where the handling of TRACE_ON and NO_SYNC packets differ, but the > > implementation enacts the same behavior for both. > > > > Mike or Robert, can you confirm that TRACE_ON and NO_SYNC packets should be > > treated the same way? > > I also would like to get suggestions/comments from Mike and Rob. > > For NO_SYNC packet, Mike before has given some explination for it: > > "looking at the decoder flow, when the decoder is reset, then it is > returned to an unsynchronised state and a NO_SYNC will be output. > With perf, it can concatenate multiple trace buffers into a single > section of the perf.data file. To ensure that the decode process can > see the start of a new buffer, the drivers will insert a barrier > packet between different buffers so that the decoder can spot the > boundaries. When the decoder hits a barrier packet it will > automatically reset so that it correctly decodes the next trace > buffer. This could be what you are seeing here." > > So I think there still have some difference between TRACE_ON and > NO_SYNC packets, TRACE_ON packet indicates the start of trace and it's > also possible caused by tracing discontinuity; NO_SYNC packets usually > caused by an unsynchronised state. But both of them presents > discontinuity in perf trace data. > > I prefer to use two different packet types to present them, this can > let the code to be more readable and easier to be maintained later. Absolutely. Make sure to get to the bottom of the story with Mike and/or Robert before sending your next patchset. > > If you agree with this, I also will rephrase the commit log to avoid > confusion that these two packets are the same thing. Yes please. > > > Leo, if handling of the TRACE_ON and NO_SYNC packets is the same then we > > should > > treat them the same way in cs_etm_decoder__gen_trace_elem_printer(), i.e > > simply > > call cs_etm_decoder__buffer_trace_on(). That way packet handling in > > cs-etm.c > > doesn't change. Otherwise see my comments below. > > > > > CS_ETM_TRACE_ON packet: both of them invokes cs_etm__flush() to generate > > > samples for the previous instructions packet, and in cs_etm__sample() it > > > also needs to generate samples if TRACE_OFF packet is followed by one > > > sequential instructions packet. This patch also converts the address to > > > 0 for TRACE_OFF packet, this is same with TRACE_ON packet as well. > > > > > > Signed-off-by: Leo Yan > > > --- > > > tools/perf/util/cs-etm-decoder/cs-etm-decoder.c | 10 ++ > > > tools/perf/util/cs-etm-decoder/cs-etm-decoder.h | 7 --- > > > tools/perf/util/cs-etm.c| 15 +++ > > > 3 files changed, 25 insertions(+), 7 deletions(-) > > > > > > diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c > > > b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c > > > index 5efb616..9d52727 100644 > > > --- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c > > > +++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c > > > @@ -369,6 +369,14 @@ cs_etm_decoder__buffer_range(struct cs_etm_decoder > > > *decoder, > > > } > > > > > > static ocsd_datapath_resp_t > > > +cs_etm_decoder__buffer_trace_off(struct cs_etm_decoder *decoder, > > > +const uint8_t trace_chan_id) > > > +{ > > > + return cs_etm_decoder__buffer_packet(decoder, trace_chan_id, > > > +CS_ETM_TRACE_OFF); > > > +} > > > + > > > +static ocsd_datapath_resp_t > > > cs_etm_decoder__buffer_trace_on(struct cs_etm_decoder *decoder, > > > const uint8_t
Re: [PATCH 1/2] ARM: Remove '-p' from LDFLAGS
On Tue, Dec 4, 2018 at 5:42 PM Nathan Chancellor wrote: > > This option is not supported by lld: > > ld.lld: error: unknown argument: -p > > This has been a no-op in binutils since 2004 (see commit dea514f51da1 in > that tree). Given that the lowest officially supported of binutils for > the kernel is 2.20, which was released in 2009, nobody needs this flag > around so just remove it. Commit 1a381d4a0a9a ("arm64: remove no-op -p > linker flag") did the same for arm64. While investigating the context of 1a381d4a0a9a for review, I found that bfd had removed -p's implementation a long time ago, but kept the flag as a no-op for compatibility. I asked that the patch be extended to arm as well, but don't think a v2 was ever cut. Thanks for sending this Nathan. Reviewed-by: Nick Desaulniers > > Signed-off-by: Nathan Chancellor > --- > arch/arm/Makefile | 2 +- > arch/arm/boot/compressed/Makefile | 2 -- > 2 files changed, 1 insertion(+), 3 deletions(-) > > diff --git a/arch/arm/Makefile b/arch/arm/Makefile > index 05a91d8b89f3..e2a0baf36766 100644 > --- a/arch/arm/Makefile > +++ b/arch/arm/Makefile > @@ -10,7 +10,7 @@ > # > # Copyright (C) 1995-2001 by Russell King > > -LDFLAGS_vmlinux:=-p --no-undefined -X --pic-veneer > +LDFLAGS_vmlinux:= --no-undefined -X --pic-veneer > ifeq ($(CONFIG_CPU_ENDIAN_BE8),y) > LDFLAGS_vmlinux+= --be8 > KBUILD_LDFLAGS_MODULE += --be8 > diff --git a/arch/arm/boot/compressed/Makefile > b/arch/arm/boot/compressed/Makefile > index 1f5a5ffe7fcf..dcd07bd24b85 100644 > --- a/arch/arm/boot/compressed/Makefile > +++ b/arch/arm/boot/compressed/Makefile > @@ -131,8 +131,6 @@ endif > ifeq ($(CONFIG_CPU_ENDIAN_BE8),y) > LDFLAGS_vmlinux += --be8 > endif > -# ? > -LDFLAGS_vmlinux += -p > # Report unresolved symbol references > LDFLAGS_vmlinux += --no-undefined > # Delete all temporary local symbols > -- > 2.20.0.rc1 > -- Thanks, ~Nick Desaulniers
Re: [PATCH 1/2] ARM: Remove '-p' from LDFLAGS
On Tue, Dec 4, 2018 at 5:42 PM Nathan Chancellor wrote: > > This option is not supported by lld: > > ld.lld: error: unknown argument: -p > > This has been a no-op in binutils since 2004 (see commit dea514f51da1 in > that tree). Given that the lowest officially supported of binutils for > the kernel is 2.20, which was released in 2009, nobody needs this flag > around so just remove it. Commit 1a381d4a0a9a ("arm64: remove no-op -p > linker flag") did the same for arm64. While investigating the context of 1a381d4a0a9a for review, I found that bfd had removed -p's implementation a long time ago, but kept the flag as a no-op for compatibility. I asked that the patch be extended to arm as well, but don't think a v2 was ever cut. Thanks for sending this Nathan. Reviewed-by: Nick Desaulniers > > Signed-off-by: Nathan Chancellor > --- > arch/arm/Makefile | 2 +- > arch/arm/boot/compressed/Makefile | 2 -- > 2 files changed, 1 insertion(+), 3 deletions(-) > > diff --git a/arch/arm/Makefile b/arch/arm/Makefile > index 05a91d8b89f3..e2a0baf36766 100644 > --- a/arch/arm/Makefile > +++ b/arch/arm/Makefile > @@ -10,7 +10,7 @@ > # > # Copyright (C) 1995-2001 by Russell King > > -LDFLAGS_vmlinux:=-p --no-undefined -X --pic-veneer > +LDFLAGS_vmlinux:= --no-undefined -X --pic-veneer > ifeq ($(CONFIG_CPU_ENDIAN_BE8),y) > LDFLAGS_vmlinux+= --be8 > KBUILD_LDFLAGS_MODULE += --be8 > diff --git a/arch/arm/boot/compressed/Makefile > b/arch/arm/boot/compressed/Makefile > index 1f5a5ffe7fcf..dcd07bd24b85 100644 > --- a/arch/arm/boot/compressed/Makefile > +++ b/arch/arm/boot/compressed/Makefile > @@ -131,8 +131,6 @@ endif > ifeq ($(CONFIG_CPU_ENDIAN_BE8),y) > LDFLAGS_vmlinux += --be8 > endif > -# ? > -LDFLAGS_vmlinux += -p > # Report unresolved symbol references > LDFLAGS_vmlinux += --no-undefined > # Delete all temporary local symbols > -- > 2.20.0.rc1 > -- Thanks, ~Nick Desaulniers
Re: rcu_preempt caused oom
On Wed, Dec 05, 2018 at 08:42:54AM +, He, Bo wrote: > I double checked the .config, we don't enable CONFIG_NO_HZ_FULL . > Our previous logs can dump all the task backtrace, and kthread (the > rcu_preempt, rcu_sched, and rcu_bh tasks) are all in "I" state not in "R" > state, my understandings are if it's the side-effect of causing RCU's > kthreads to be run at SCHED_FIFO priority 1, the kthreads should be in R > state. Hmmm... Well, the tasks could in theory be waiting on a blocking mutex. But in practice the grace-period kthreads wait on events, so that makes no sense. Is it possible for you to dump out the grace-period kthread's stack, for example, with sysreq-t? (Steve might know a better way to do this.) > I will do more experiments and keep you update once we have more findings: > 1. set the kthread priority to SCHED_FIFO without CONFIG_RCU_BOOST and see if > the issue can reproduce. That sounds like a most excellent experiment! > 2. check more ftrace to double confirm why there is no > trace_rcu_quiescent_state_report and most of the trace_rcu_grace_period are > in "AccWaitCB". As noted earlier, to see something interesting, you will need to start the ftrace before the grace period starts. This would probably mean having ftrace running before starting the test. Starting the ftrace after the hang commences is unlikely to produce useful information. Thanx, Paul > -Original Message- > From: Paul E. McKenney > Sent: Wednesday, December 5, 2018 3:50 AM > To: He, Bo > Cc: Steven Rostedt ; linux-kernel@vger.kernel.org; > j...@joshtriplett.org; mathieu.desnoy...@efficios.com; > jiangshan...@gmail.com; Zhang, Jun ; Xiao, Jin > ; Zhang, Yanmin ; Bai, Jie A > > Subject: Re: rcu_preempt caused oom > > On Tue, Dec 04, 2018 at 07:50:04AM +, He, Bo wrote: > > Hi, Paul: > > the enclosed is the log trigger the 120s hung_task_panic without other > > debug patches, the hung task is blocked at __wait_rcu_gp, it means the > > rcu_cpu_stall can't detect the scenario: > > echo 1 > /proc/sys/kernel/panic_on_rcu_stall > > echo 7 > /sys/module/rcupdate/parameters/rcu_cpu_stall_timeout > > Not necessarily. If there is an RCU CPU stall warning, blocking within > __wait_rcu_gp() is expected behavior. It is possible that the problem is > that although the grace period is completing as required, the callbacks are > not being invoked in a timely fashion. And that could happen if you had > CONFIG_NO_HZ_FULL and a bunch of nohz_full CPUs, or, alternatively, callback > offloading enabled. But I don't see these in your previous emails. Another > possible cause is that the grace-period kthread is being delayed, so that the > grace period never starts. This seems unlikely, but it is the only thing > thus far that matches the symptoms. > > CONFIG_RCU_BOOST=y has the side-effect of causing RCU's kthreads to be run at > SCHED_FIFO priority 1, and that would help in the case where RCU's > grace-period kthread (the rcu_preempt, rcu_sched, and rcu_bh tasks, all of > which execute in the rcu_gp_kthread() function) was being starved of CPU time. > > Does that sound likely? > > Thanx, Paul > > > -Original Message- > > From: Paul E. McKenney > > Sent: Monday, December 3, 2018 9:57 PM > > To: He, Bo > > Cc: Steven Rostedt ; > > linux-kernel@vger.kernel.org; j...@joshtriplett.org; > > mathieu.desnoy...@efficios.com; jiangshan...@gmail.com; Zhang, Jun > > ; Xiao, Jin ; Zhang, Yanmin > > > > Subject: Re: rcu_preempt caused oom > > > > On Mon, Dec 03, 2018 at 07:44:03AM +, He, Bo wrote: > > > Thanks, we have run the test for the whole weekend and not reproduce the > > > issue, so we confirm the CONFIG_RCU_BOOST can fix the issue. > > > > Very good, that is encouraging. Perhaps I should think about making > > CONFIG_RCU_BOOST=y the default for CONFIG_PREEMPT in mainline, at least for > > architectures for which rt_mutexes are implemented. > > > > > We have enabled the rcupdate.rcu_cpu_stall_timeout=7 and also set panic > > > on rcu stall and will see if we can see the panic, will keep you posed > > > with the test results. > > > echo 1 > /proc/sys/kernel/panic_on_rcu_stall > > > > Looking forward to seeing what is going on! Of course, to reproduce, you > > will need to again build with CONFIG_RCU_BOOST=n. > > > > Thanx, Paul > > > > > -Original Message- > > > From: Paul E. McKenney > > > Sent: Saturday, December 1, 2018 12:49 AM > > > To: He, Bo > > > Cc: Steven Rostedt ; > > > linux-kernel@vger.kernel.org; j...@joshtriplett.org; > > > mathieu.desnoy...@efficios.com; jiangshan...@gmail.com; Zhang, Jun > > > ; Xiao, Jin ; Zhang, Yanmin > > > > > > Subject: Re: rcu_preempt caused oom > > > > > > On Fri, Nov 30, 2018 at 03:18:58PM +, He, Bo wrote: > > > > Here is the kernel
Re: rcu_preempt caused oom
On Wed, Dec 05, 2018 at 08:42:54AM +, He, Bo wrote: > I double checked the .config, we don't enable CONFIG_NO_HZ_FULL . > Our previous logs can dump all the task backtrace, and kthread (the > rcu_preempt, rcu_sched, and rcu_bh tasks) are all in "I" state not in "R" > state, my understandings are if it's the side-effect of causing RCU's > kthreads to be run at SCHED_FIFO priority 1, the kthreads should be in R > state. Hmmm... Well, the tasks could in theory be waiting on a blocking mutex. But in practice the grace-period kthreads wait on events, so that makes no sense. Is it possible for you to dump out the grace-period kthread's stack, for example, with sysreq-t? (Steve might know a better way to do this.) > I will do more experiments and keep you update once we have more findings: > 1. set the kthread priority to SCHED_FIFO without CONFIG_RCU_BOOST and see if > the issue can reproduce. That sounds like a most excellent experiment! > 2. check more ftrace to double confirm why there is no > trace_rcu_quiescent_state_report and most of the trace_rcu_grace_period are > in "AccWaitCB". As noted earlier, to see something interesting, you will need to start the ftrace before the grace period starts. This would probably mean having ftrace running before starting the test. Starting the ftrace after the hang commences is unlikely to produce useful information. Thanx, Paul > -Original Message- > From: Paul E. McKenney > Sent: Wednesday, December 5, 2018 3:50 AM > To: He, Bo > Cc: Steven Rostedt ; linux-kernel@vger.kernel.org; > j...@joshtriplett.org; mathieu.desnoy...@efficios.com; > jiangshan...@gmail.com; Zhang, Jun ; Xiao, Jin > ; Zhang, Yanmin ; Bai, Jie A > > Subject: Re: rcu_preempt caused oom > > On Tue, Dec 04, 2018 at 07:50:04AM +, He, Bo wrote: > > Hi, Paul: > > the enclosed is the log trigger the 120s hung_task_panic without other > > debug patches, the hung task is blocked at __wait_rcu_gp, it means the > > rcu_cpu_stall can't detect the scenario: > > echo 1 > /proc/sys/kernel/panic_on_rcu_stall > > echo 7 > /sys/module/rcupdate/parameters/rcu_cpu_stall_timeout > > Not necessarily. If there is an RCU CPU stall warning, blocking within > __wait_rcu_gp() is expected behavior. It is possible that the problem is > that although the grace period is completing as required, the callbacks are > not being invoked in a timely fashion. And that could happen if you had > CONFIG_NO_HZ_FULL and a bunch of nohz_full CPUs, or, alternatively, callback > offloading enabled. But I don't see these in your previous emails. Another > possible cause is that the grace-period kthread is being delayed, so that the > grace period never starts. This seems unlikely, but it is the only thing > thus far that matches the symptoms. > > CONFIG_RCU_BOOST=y has the side-effect of causing RCU's kthreads to be run at > SCHED_FIFO priority 1, and that would help in the case where RCU's > grace-period kthread (the rcu_preempt, rcu_sched, and rcu_bh tasks, all of > which execute in the rcu_gp_kthread() function) was being starved of CPU time. > > Does that sound likely? > > Thanx, Paul > > > -Original Message- > > From: Paul E. McKenney > > Sent: Monday, December 3, 2018 9:57 PM > > To: He, Bo > > Cc: Steven Rostedt ; > > linux-kernel@vger.kernel.org; j...@joshtriplett.org; > > mathieu.desnoy...@efficios.com; jiangshan...@gmail.com; Zhang, Jun > > ; Xiao, Jin ; Zhang, Yanmin > > > > Subject: Re: rcu_preempt caused oom > > > > On Mon, Dec 03, 2018 at 07:44:03AM +, He, Bo wrote: > > > Thanks, we have run the test for the whole weekend and not reproduce the > > > issue, so we confirm the CONFIG_RCU_BOOST can fix the issue. > > > > Very good, that is encouraging. Perhaps I should think about making > > CONFIG_RCU_BOOST=y the default for CONFIG_PREEMPT in mainline, at least for > > architectures for which rt_mutexes are implemented. > > > > > We have enabled the rcupdate.rcu_cpu_stall_timeout=7 and also set panic > > > on rcu stall and will see if we can see the panic, will keep you posed > > > with the test results. > > > echo 1 > /proc/sys/kernel/panic_on_rcu_stall > > > > Looking forward to seeing what is going on! Of course, to reproduce, you > > will need to again build with CONFIG_RCU_BOOST=n. > > > > Thanx, Paul > > > > > -Original Message- > > > From: Paul E. McKenney > > > Sent: Saturday, December 1, 2018 12:49 AM > > > To: He, Bo > > > Cc: Steven Rostedt ; > > > linux-kernel@vger.kernel.org; j...@joshtriplett.org; > > > mathieu.desnoy...@efficios.com; jiangshan...@gmail.com; Zhang, Jun > > > ; Xiao, Jin ; Zhang, Yanmin > > > > > > Subject: Re: rcu_preempt caused oom > > > > > > On Fri, Nov 30, 2018 at 03:18:58PM +, He, Bo wrote: > > > > Here is the kernel
Re: [PATCH 14/14] power: supply: axp288: use the BIT() macro
Hi, On Mon, Nov 26, 2018 at 05:27:55PM +0200, Priit Laes wrote: > From: Olliver Schinagl > > Make use of the recommended BIT() macro for bit defines. > > Signed-off-by: Olliver Schinagl > Signed-off-by: Priit Laes > --- Thanks, queued to power-supply-next. -- Sebastian > drivers/power/supply/axp288_charger.c | 35 ++-- > 1 file changed, 18 insertions(+), 17 deletions(-) > > diff --git a/drivers/power/supply/axp288_charger.c > b/drivers/power/supply/axp288_charger.c > index 735658e..f8c6da9 100644 > --- a/drivers/power/supply/axp288_charger.c > +++ b/drivers/power/supply/axp288_charger.c > @@ -16,6 +16,7 @@ > */ > > #include > +#include > #include > #include > #include > @@ -29,17 +30,17 @@ > #include > #include > > -#define PS_STAT_VBUS_TRIGGER (1 << 0) > -#define PS_STAT_BAT_CHRG_DIR (1 << 2) > -#define PS_STAT_VBAT_ABOVE_VHOLD (1 << 3) > -#define PS_STAT_VBUS_VALID (1 << 4) > -#define PS_STAT_VBUS_PRESENT (1 << 5) > +#define PS_STAT_VBUS_TRIGGER BIT(0) > +#define PS_STAT_BAT_CHRG_DIR BIT(2) > +#define PS_STAT_VBAT_ABOVE_VHOLD BIT(3) > +#define PS_STAT_VBUS_VALID BIT(4) > +#define PS_STAT_VBUS_PRESENT BIT(5) > > -#define CHRG_STAT_BAT_SAFE_MODE (1 << 3) > -#define CHRG_STAT_BAT_VALID (1 << 4) > -#define CHRG_STAT_BAT_PRESENT(1 << 5) > -#define CHRG_STAT_CHARGING (1 << 6) > -#define CHRG_STAT_PMIC_OTP (1 << 7) > +#define CHRG_STAT_BAT_SAFE_MODE BIT(3) > +#define CHRG_STAT_BAT_VALID BIT(4) > +#define CHRG_STAT_BAT_PRESENTBIT(5) > +#define CHRG_STAT_CHARGING BIT(6) > +#define CHRG_STAT_PMIC_OTP BIT(7) > > #define VBUS_ISPOUT_CUR_LIM_MASK 0x03 > #define VBUS_ISPOUT_CUR_LIM_BIT_POS 0 > @@ -52,33 +53,33 @@ > #define VBUS_ISPOUT_VHOLD_SET_OFFSET 4000/* 4000mV */ > #define VBUS_ISPOUT_VHOLD_SET_LSB_RES100 /* 100mV */ > #define VBUS_ISPOUT_VHOLD_SET_4300MV 0x3 /* 4300mV */ > -#define VBUS_ISPOUT_VBUS_PATH_DIS(1 << 7) > +#define VBUS_ISPOUT_VBUS_PATH_DISBIT(7) > > #define CHRG_CCCV_CC_MASK0xf /* 4 bits */ > #define CHRG_CCCV_CC_BIT_POS 0 > #define CHRG_CCCV_CC_OFFSET 200 /* 200mA */ > #define CHRG_CCCV_CC_LSB_RES 200 /* 200mA */ > -#define CHRG_CCCV_ITERM_20P (1 << 4)/* 20% of CC */ > +#define CHRG_CCCV_ITERM_20P BIT(4) /* 20% of CC */ > #define CHRG_CCCV_CV_MASK0x60/* 2 bits */ > #define CHRG_CCCV_CV_BIT_POS 5 > #define CHRG_CCCV_CV_4100MV 0x0 /* 4.10V */ > #define CHRG_CCCV_CV_4150MV 0x1 /* 4.15V */ > #define CHRG_CCCV_CV_4200MV 0x2 /* 4.20V */ > #define CHRG_CCCV_CV_4350MV 0x3 /* 4.35V */ > -#define CHRG_CCCV_CHG_EN (1 << 7) > +#define CHRG_CCCV_CHG_EN BIT(7) > > #define CNTL2_CC_TIMEOUT_MASK0x3 /* 2 bits */ > #define CNTL2_CC_TIMEOUT_OFFSET 6 /* 6 Hrs */ > #define CNTL2_CC_TIMEOUT_LSB_RES 2 /* 2 Hrs */ > #define CNTL2_CC_TIMEOUT_12HRS 0x3 /* 12 Hrs */ > -#define CNTL2_CHGLED_TYPEB (1 << 4) > -#define CNTL2_CHG_OUT_TURNON (1 << 5) > +#define CNTL2_CHGLED_TYPEB BIT(4) > +#define CNTL2_CHG_OUT_TURNON BIT(5) > #define CNTL2_PC_TIMEOUT_MASK0xC0 > #define CNTL2_PC_TIMEOUT_OFFSET 40 /* 40 mins */ > #define CNTL2_PC_TIMEOUT_LSB_RES 10 /* 10 mins */ > #define CNTL2_PC_TIMEOUT_70MINS 0x3 > > -#define CHRG_ILIM_TEMP_LOOP_EN (1 << 3) > +#define CHRG_ILIM_TEMP_LOOP_EN BIT(3) > #define CHRG_VBUS_ILIM_MASK 0xf0 > #define CHRG_VBUS_ILIM_BIT_POS 4 > #define CHRG_VBUS_ILIM_100MA 0x0 /* 100mA */ > @@ -94,7 +95,7 @@ > #define CHRG_VLTFC_0C0xA5/* 0 DegC */ > #define CHRG_VHTFC_45C 0x1F/* 45 DegC */ > > -#define FG_CNTL_OCV_ADJ_EN (1 << 3) > +#define FG_CNTL_OCV_ADJ_EN BIT(3) > > #define CV_4100MV4100/* 4100mV */ > #define CV_4150MV4150/* 4150mV */ > -- > git-series 0.9.1 signature.asc Description: PGP signature
Re: [PATCH 14/14] power: supply: axp288: use the BIT() macro
Hi, On Mon, Nov 26, 2018 at 05:27:55PM +0200, Priit Laes wrote: > From: Olliver Schinagl > > Make use of the recommended BIT() macro for bit defines. > > Signed-off-by: Olliver Schinagl > Signed-off-by: Priit Laes > --- Thanks, queued to power-supply-next. -- Sebastian > drivers/power/supply/axp288_charger.c | 35 ++-- > 1 file changed, 18 insertions(+), 17 deletions(-) > > diff --git a/drivers/power/supply/axp288_charger.c > b/drivers/power/supply/axp288_charger.c > index 735658e..f8c6da9 100644 > --- a/drivers/power/supply/axp288_charger.c > +++ b/drivers/power/supply/axp288_charger.c > @@ -16,6 +16,7 @@ > */ > > #include > +#include > #include > #include > #include > @@ -29,17 +30,17 @@ > #include > #include > > -#define PS_STAT_VBUS_TRIGGER (1 << 0) > -#define PS_STAT_BAT_CHRG_DIR (1 << 2) > -#define PS_STAT_VBAT_ABOVE_VHOLD (1 << 3) > -#define PS_STAT_VBUS_VALID (1 << 4) > -#define PS_STAT_VBUS_PRESENT (1 << 5) > +#define PS_STAT_VBUS_TRIGGER BIT(0) > +#define PS_STAT_BAT_CHRG_DIR BIT(2) > +#define PS_STAT_VBAT_ABOVE_VHOLD BIT(3) > +#define PS_STAT_VBUS_VALID BIT(4) > +#define PS_STAT_VBUS_PRESENT BIT(5) > > -#define CHRG_STAT_BAT_SAFE_MODE (1 << 3) > -#define CHRG_STAT_BAT_VALID (1 << 4) > -#define CHRG_STAT_BAT_PRESENT(1 << 5) > -#define CHRG_STAT_CHARGING (1 << 6) > -#define CHRG_STAT_PMIC_OTP (1 << 7) > +#define CHRG_STAT_BAT_SAFE_MODE BIT(3) > +#define CHRG_STAT_BAT_VALID BIT(4) > +#define CHRG_STAT_BAT_PRESENTBIT(5) > +#define CHRG_STAT_CHARGING BIT(6) > +#define CHRG_STAT_PMIC_OTP BIT(7) > > #define VBUS_ISPOUT_CUR_LIM_MASK 0x03 > #define VBUS_ISPOUT_CUR_LIM_BIT_POS 0 > @@ -52,33 +53,33 @@ > #define VBUS_ISPOUT_VHOLD_SET_OFFSET 4000/* 4000mV */ > #define VBUS_ISPOUT_VHOLD_SET_LSB_RES100 /* 100mV */ > #define VBUS_ISPOUT_VHOLD_SET_4300MV 0x3 /* 4300mV */ > -#define VBUS_ISPOUT_VBUS_PATH_DIS(1 << 7) > +#define VBUS_ISPOUT_VBUS_PATH_DISBIT(7) > > #define CHRG_CCCV_CC_MASK0xf /* 4 bits */ > #define CHRG_CCCV_CC_BIT_POS 0 > #define CHRG_CCCV_CC_OFFSET 200 /* 200mA */ > #define CHRG_CCCV_CC_LSB_RES 200 /* 200mA */ > -#define CHRG_CCCV_ITERM_20P (1 << 4)/* 20% of CC */ > +#define CHRG_CCCV_ITERM_20P BIT(4) /* 20% of CC */ > #define CHRG_CCCV_CV_MASK0x60/* 2 bits */ > #define CHRG_CCCV_CV_BIT_POS 5 > #define CHRG_CCCV_CV_4100MV 0x0 /* 4.10V */ > #define CHRG_CCCV_CV_4150MV 0x1 /* 4.15V */ > #define CHRG_CCCV_CV_4200MV 0x2 /* 4.20V */ > #define CHRG_CCCV_CV_4350MV 0x3 /* 4.35V */ > -#define CHRG_CCCV_CHG_EN (1 << 7) > +#define CHRG_CCCV_CHG_EN BIT(7) > > #define CNTL2_CC_TIMEOUT_MASK0x3 /* 2 bits */ > #define CNTL2_CC_TIMEOUT_OFFSET 6 /* 6 Hrs */ > #define CNTL2_CC_TIMEOUT_LSB_RES 2 /* 2 Hrs */ > #define CNTL2_CC_TIMEOUT_12HRS 0x3 /* 12 Hrs */ > -#define CNTL2_CHGLED_TYPEB (1 << 4) > -#define CNTL2_CHG_OUT_TURNON (1 << 5) > +#define CNTL2_CHGLED_TYPEB BIT(4) > +#define CNTL2_CHG_OUT_TURNON BIT(5) > #define CNTL2_PC_TIMEOUT_MASK0xC0 > #define CNTL2_PC_TIMEOUT_OFFSET 40 /* 40 mins */ > #define CNTL2_PC_TIMEOUT_LSB_RES 10 /* 10 mins */ > #define CNTL2_PC_TIMEOUT_70MINS 0x3 > > -#define CHRG_ILIM_TEMP_LOOP_EN (1 << 3) > +#define CHRG_ILIM_TEMP_LOOP_EN BIT(3) > #define CHRG_VBUS_ILIM_MASK 0xf0 > #define CHRG_VBUS_ILIM_BIT_POS 4 > #define CHRG_VBUS_ILIM_100MA 0x0 /* 100mA */ > @@ -94,7 +95,7 @@ > #define CHRG_VLTFC_0C0xA5/* 0 DegC */ > #define CHRG_VHTFC_45C 0x1F/* 45 DegC */ > > -#define FG_CNTL_OCV_ADJ_EN (1 << 3) > +#define FG_CNTL_OCV_ADJ_EN BIT(3) > > #define CV_4100MV4100/* 4100mV */ > #define CV_4150MV4150/* 4150mV */ > -- > git-series 0.9.1 signature.asc Description: PGP signature
Re: [PATCH 13/14] power: supply: axp20x: add missing include bitops.h
Hi, On Mon, Nov 26, 2018 at 05:27:54PM +0200, Priit Laes wrote: > From: Olliver Schinagl > > The axp20x_usb_power driver uses BIT() operations but lacks the include > for it. Include the bitops.h header file. > > Signed-off-by: Olliver Schinagl > Signed-off-by: Priit Laes > --- > drivers/power/supply/axp20x_usb_power.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/drivers/power/supply/axp20x_usb_power.c > b/drivers/power/supply/axp20x_usb_power.c > index 42001df..f52fe77 100644 > --- a/drivers/power/supply/axp20x_usb_power.c > +++ b/drivers/power/supply/axp20x_usb_power.c > @@ -10,6 +10,7 @@ > * option) any later version. > */ > > +#include > #include > #include > #include > -- > git-series 0.9.1 Thanks, queued to power-supply-next. -- Sebastian signature.asc Description: PGP signature
Re: [PATCH 13/14] power: supply: axp20x: add missing include bitops.h
Hi, On Mon, Nov 26, 2018 at 05:27:54PM +0200, Priit Laes wrote: > From: Olliver Schinagl > > The axp20x_usb_power driver uses BIT() operations but lacks the include > for it. Include the bitops.h header file. > > Signed-off-by: Olliver Schinagl > Signed-off-by: Priit Laes > --- > drivers/power/supply/axp20x_usb_power.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/drivers/power/supply/axp20x_usb_power.c > b/drivers/power/supply/axp20x_usb_power.c > index 42001df..f52fe77 100644 > --- a/drivers/power/supply/axp20x_usb_power.c > +++ b/drivers/power/supply/axp20x_usb_power.c > @@ -10,6 +10,7 @@ > * option) any later version. > */ > > +#include > #include > #include > #include > -- > git-series 0.9.1 Thanks, queued to power-supply-next. -- Sebastian signature.asc Description: PGP signature
Re: [RFC PATCH 02/14] mm/hms: heterogenenous memory system (HMS) documentation
On 2018-12-04 7:31 p.m., Jerome Glisse wrote: > How can i express multiple link, or memory that is only accessible > by a subset of the devices/CPUs. In today model they are back in > assumption like everyone can access all the node which do not hold > in what i am trying to do. Well multiple links are easy when you have a 'link' bus. Just add another link device under the bus. Technically, the accessibility issue is already encoded in sysfs. For example, through the PCI tree you can determine which ACS bits are set and determine which devices are behind the same root bridge the same way we do in the kernel p2pdma subsystem. This is all bus specific which is fine, but if we want to change that, we should have a common way for existing buses to describe these attributes in the existing tree. The new 'link' bus devices would have to have some way to describe cases if memory isn't accessible in some way across it. But really, I would say the kernel is responsible for telling you when memory is accessible to a list of initiators, so it should be part of the checks in a theoretical hbind api. This is already the approach p2pdma takes in-kernel: we have functions that tell you if two PCI devices can talk to each other and we have functions to give you memory accessible by a set of devices. What we don't have is a special tree that p2pdma users have to walk through to determine accessibility. In my eye's, you are just conflating a bunch of different issues that are better solved independently in the existing frameworks we have. And if they were tackled individually, you'd have a much easier time getting them merged one by one. Logan
Re: [RFC PATCH 02/14] mm/hms: heterogenenous memory system (HMS) documentation
On 2018-12-04 7:31 p.m., Jerome Glisse wrote: > How can i express multiple link, or memory that is only accessible > by a subset of the devices/CPUs. In today model they are back in > assumption like everyone can access all the node which do not hold > in what i am trying to do. Well multiple links are easy when you have a 'link' bus. Just add another link device under the bus. Technically, the accessibility issue is already encoded in sysfs. For example, through the PCI tree you can determine which ACS bits are set and determine which devices are behind the same root bridge the same way we do in the kernel p2pdma subsystem. This is all bus specific which is fine, but if we want to change that, we should have a common way for existing buses to describe these attributes in the existing tree. The new 'link' bus devices would have to have some way to describe cases if memory isn't accessible in some way across it. But really, I would say the kernel is responsible for telling you when memory is accessible to a list of initiators, so it should be part of the checks in a theoretical hbind api. This is already the approach p2pdma takes in-kernel: we have functions that tell you if two PCI devices can talk to each other and we have functions to give you memory accessible by a set of devices. What we don't have is a special tree that p2pdma users have to walk through to determine accessibility. In my eye's, you are just conflating a bunch of different issues that are better solved independently in the existing frameworks we have. And if they were tackled individually, you'd have a much easier time getting them merged one by one. Logan
Re: [PATCH v2 1/2] perf cs-etm: Set branch instruction flags in packet
On Tue, 4 Dec 2018 at 23:26, wrote: > > On Mon, Nov 19, 2018 at 03:26:17PM -0700, Mathieu Poirier wrote: > > On Sun, Nov 11, 2018 at 01:07:55PM +0800, Leo Yan wrote: > > > The perf sample data contains flags to indicate the hardware trace data > > > is belonging to which type branch instruction, thus this can be used to > > > print out the human readable string. Arm CoreSight ETM sample data is > > > missed to set flags and it is always set to zeros, this results in perf > > > tool skips to print string for instruction types. > > > > > > Arm CoreSight ETM supports different kinds instruction of A64, A32 and > > > T32; this patch is to set branch instruction flags in packet for these > > > ISAs. > > > > > > The brief idea for patch implementation is describe as below: > > > > > > - For element with OCSD_GEN_TRC_ELEM_TRACE_ON type, it is taken as trace > > > beginning packet; for element with OCSD_GEN_TRC_ELEM_NO_SYNC or > > > OCSD_GEN_TRC_ELEM_EO_TRACE, these two kinds elements are used to set > > > for trace end; > > > > > > As Mike suggested the packet stream might have more than one two > > > TRACE_ON packets, the first one TRACE_ON packet indicates trace end > > > and the second one is taken as trace restarting. We will handle this > > > special case in the upper layer with packet queue handling, which has > > > more context so it's more suitable fix up for it. This will be > > > accomplished in the sequential patch. > > > > > > - For instruction range packet, mainly base on three factors to decide > > > the branch instruction types: > > > > > > elem->last_i_type > > > elem->last_i_subtype > > > elem->last_instr_cond > > > > > > If the instruction is immediate branch but without link and return > > > flag, we consider it as function internal branch; in fact the > > > immediate branch also can be used to invoke the function entry, > > > usually this is only used in assembly code to directly call a symbol > > > and don't expect to return back; after reviewing kernel normal > > > functions and user space programs, both of them are very seldom to use > > > immediate branch for function call. On the other hand, if we want to > > > decide the immediate branch is for function branch jumping or for > > > function calling, we need to rely on the start address of next packet > > > and check the symbol offset for the start address, this will > > > introduce much complexity in the implementation. So for this version > > > we simply consider immediate branch as function internal branch. > > > Moreover, we rely on 'elem->last_instr_cond' to decide if the branch > > > instruction is a conditional branch or not. > > > > > > If the instruction is immediate branch with link, it's instruction > > > 'BL' and which is used for function call. > > > > > > If the instruction is indirect branch and with subtype > > > OCSD_S_INSTR_V7_IMPLIED_RET, the decoders gives the hint the function > > > return for below cases related with A32/T32 instruction; set this > > > branch flag as function return (Thanks for Al's suggestion). > > > > > > BX R14 > > > MOV PC, LR > > > POP {…, PC} > > > LDR PC, [SP], #offset > > > > > > If the instruction is indirect branch without link, this is > > > corresponding to instruction 'BR', this instruction usually is used > > > for dynamic link lib with below usage; so we think it's a return > > > instruction. > > > > > > 0680 <.plt>: > > > 680: a9bf7bf0stp x16, x30, [sp, #-16]! > > > 684: 9090adrpx16, 1 <__FRAME_END__+0xf630> > > > 688: f947fe11ldr x17, [x16, #4088] > > > 68c: 913fe210add x16, x16, #0xff8 > > > 690: d61f0220br x17 > > > > > > If the instruction is indirect branch with link, e.g BLR, we think > > > it's a function call. > > > > > > For function return, ARMv8 introduces a dedicated instruction 'ret', > > > which has flag of OCSD_S_INSTR_V8_RET. > > > > > > - For exception packets, this patch divides into three types: > > > > > > The first type of exception is caused by external logics like bus, > > > interrupt controller, debug module or PE reset or halt; this is > > > corresponding to flags "bcyi" which defined in doc perf-script.txt; > > > > > > The second type is for system call, this is set as "bcs" by following > > > definition in the doc; > > > > > > The third type is for CPU trap, data and instruction prefetch abort, > > > alignment abort; usually these exceptions are synchronous for CPU, so > > > set them as "bci" type. > > > > This is too long and needs to be broken down into pieces. I would split > > this > > patch in 3 heat, one for NO_SYNC and TRACE_ON, one for INSTR_RANGE and one > > for > > ELEM_EXCEPTION/ELEM_EXCEPTION_RET. > > This makes sense, will split to 3 patches. > > > > > > > Cc: Mathieu Poirier > > > Cc: Mike Leach > > >
Re: [PATCH v2 1/2] perf cs-etm: Set branch instruction flags in packet
On Tue, 4 Dec 2018 at 23:26, wrote: > > On Mon, Nov 19, 2018 at 03:26:17PM -0700, Mathieu Poirier wrote: > > On Sun, Nov 11, 2018 at 01:07:55PM +0800, Leo Yan wrote: > > > The perf sample data contains flags to indicate the hardware trace data > > > is belonging to which type branch instruction, thus this can be used to > > > print out the human readable string. Arm CoreSight ETM sample data is > > > missed to set flags and it is always set to zeros, this results in perf > > > tool skips to print string for instruction types. > > > > > > Arm CoreSight ETM supports different kinds instruction of A64, A32 and > > > T32; this patch is to set branch instruction flags in packet for these > > > ISAs. > > > > > > The brief idea for patch implementation is describe as below: > > > > > > - For element with OCSD_GEN_TRC_ELEM_TRACE_ON type, it is taken as trace > > > beginning packet; for element with OCSD_GEN_TRC_ELEM_NO_SYNC or > > > OCSD_GEN_TRC_ELEM_EO_TRACE, these two kinds elements are used to set > > > for trace end; > > > > > > As Mike suggested the packet stream might have more than one two > > > TRACE_ON packets, the first one TRACE_ON packet indicates trace end > > > and the second one is taken as trace restarting. We will handle this > > > special case in the upper layer with packet queue handling, which has > > > more context so it's more suitable fix up for it. This will be > > > accomplished in the sequential patch. > > > > > > - For instruction range packet, mainly base on three factors to decide > > > the branch instruction types: > > > > > > elem->last_i_type > > > elem->last_i_subtype > > > elem->last_instr_cond > > > > > > If the instruction is immediate branch but without link and return > > > flag, we consider it as function internal branch; in fact the > > > immediate branch also can be used to invoke the function entry, > > > usually this is only used in assembly code to directly call a symbol > > > and don't expect to return back; after reviewing kernel normal > > > functions and user space programs, both of them are very seldom to use > > > immediate branch for function call. On the other hand, if we want to > > > decide the immediate branch is for function branch jumping or for > > > function calling, we need to rely on the start address of next packet > > > and check the symbol offset for the start address, this will > > > introduce much complexity in the implementation. So for this version > > > we simply consider immediate branch as function internal branch. > > > Moreover, we rely on 'elem->last_instr_cond' to decide if the branch > > > instruction is a conditional branch or not. > > > > > > If the instruction is immediate branch with link, it's instruction > > > 'BL' and which is used for function call. > > > > > > If the instruction is indirect branch and with subtype > > > OCSD_S_INSTR_V7_IMPLIED_RET, the decoders gives the hint the function > > > return for below cases related with A32/T32 instruction; set this > > > branch flag as function return (Thanks for Al's suggestion). > > > > > > BX R14 > > > MOV PC, LR > > > POP {…, PC} > > > LDR PC, [SP], #offset > > > > > > If the instruction is indirect branch without link, this is > > > corresponding to instruction 'BR', this instruction usually is used > > > for dynamic link lib with below usage; so we think it's a return > > > instruction. > > > > > > 0680 <.plt>: > > > 680: a9bf7bf0stp x16, x30, [sp, #-16]! > > > 684: 9090adrpx16, 1 <__FRAME_END__+0xf630> > > > 688: f947fe11ldr x17, [x16, #4088] > > > 68c: 913fe210add x16, x16, #0xff8 > > > 690: d61f0220br x17 > > > > > > If the instruction is indirect branch with link, e.g BLR, we think > > > it's a function call. > > > > > > For function return, ARMv8 introduces a dedicated instruction 'ret', > > > which has flag of OCSD_S_INSTR_V8_RET. > > > > > > - For exception packets, this patch divides into three types: > > > > > > The first type of exception is caused by external logics like bus, > > > interrupt controller, debug module or PE reset or halt; this is > > > corresponding to flags "bcyi" which defined in doc perf-script.txt; > > > > > > The second type is for system call, this is set as "bcs" by following > > > definition in the doc; > > > > > > The third type is for CPU trap, data and instruction prefetch abort, > > > alignment abort; usually these exceptions are synchronous for CPU, so > > > set them as "bci" type. > > > > This is too long and needs to be broken down into pieces. I would split > > this > > patch in 3 heat, one for NO_SYNC and TRACE_ON, one for INSTR_RANGE and one > > for > > ELEM_EXCEPTION/ELEM_EXCEPTION_RET. > > This makes sense, will split to 3 patches. > > > > > > > Cc: Mathieu Poirier > > > Cc: Mike Leach > > >
Re: [PATCH] firmware: qcom: scm: fix compilation error when disabled
+ linux-arm-msm On Wed 21 Nov 18:32 PST 2018, Jonathan Marek wrote: > This fixes the case when CONFIG_QCOM_SCM is not enabled, and linux/errno.h > has not been included previously. > > Signed-off-by: Jonathan Marek Reviewed-by: Bjorn Andersson Regards, Bjorn > --- > drm/msm includes qcom_scm.h, but can be used with non-qcom hw (IMX5) > > include/linux/qcom_scm.h | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/include/linux/qcom_scm.h b/include/linux/qcom_scm.h > index 06996ad4f2bc..1637385bcc17 100644 > --- a/include/linux/qcom_scm.h > +++ b/include/linux/qcom_scm.h > @@ -67,6 +67,9 @@ extern int qcom_scm_iommu_secure_ptbl_init(u64 addr, u32 > size, u32 spare); > extern int qcom_scm_io_readl(phys_addr_t addr, unsigned int *val); > extern int qcom_scm_io_writel(phys_addr_t addr, unsigned int val); > #else > + > +#include > + > static inline > int qcom_scm_set_cold_boot_addr(void *entry, const cpumask_t *cpus) > { > -- > 2.17.1 >
Re: [PATCH] firmware: qcom: scm: fix compilation error when disabled
+ linux-arm-msm On Wed 21 Nov 18:32 PST 2018, Jonathan Marek wrote: > This fixes the case when CONFIG_QCOM_SCM is not enabled, and linux/errno.h > has not been included previously. > > Signed-off-by: Jonathan Marek Reviewed-by: Bjorn Andersson Regards, Bjorn > --- > drm/msm includes qcom_scm.h, but can be used with non-qcom hw (IMX5) > > include/linux/qcom_scm.h | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/include/linux/qcom_scm.h b/include/linux/qcom_scm.h > index 06996ad4f2bc..1637385bcc17 100644 > --- a/include/linux/qcom_scm.h > +++ b/include/linux/qcom_scm.h > @@ -67,6 +67,9 @@ extern int qcom_scm_iommu_secure_ptbl_init(u64 addr, u32 > size, u32 spare); > extern int qcom_scm_io_readl(phys_addr_t addr, unsigned int *val); > extern int qcom_scm_io_writel(phys_addr_t addr, unsigned int val); > #else > + > +#include > + > static inline > int qcom_scm_set_cold_boot_addr(void *entry, const cpumask_t *cpus) > { > -- > 2.17.1 >
Re: [PATCH 10/12] staging: rtl8188eu: correct indentation in update_wireless_mode()
On Wed, 2018-12-05 at 18:02 +0100, Michael Straube wrote: > Correct indentation in update_wireless_mode() to clear a checkpatch > warning. WARNING: Statements should start on a tabstop [] > diff --git a/drivers/staging/rtl8188eu/core/rtw_wlan_util.c > b/drivers/staging/rtl8188eu/core/rtw_wlan_util.c [] > @@ -1442,7 +1442,7 @@ void update_wireless_mode(struct adapter *padapter) > > if (pmlmeext->cur_wireless_mode & WIRELESS_11B) > update_mgnt_tx_rate(padapter, IEEE80211_CCK_RATE_1MB); > - else > + else > update_mgnt_tx_rate(padapter, IEEE80211_OFDM_RATE_6MB); > } . gcc generally emits smaller code by using a single call and a ternary. $ size drivers/staging/rtl8188eu/core/rtw_wlan_util.o* textdata bss dec hex filename 14974 36 0 150103aa2 drivers/staging/rtl8188eu/core/rtw_wlan_util.o.new 14990 36 0 150263ab2 drivers/staging/rtl8188eu/core/rtw_wlan_util.o.old --- drivers/staging/rtl8188eu/core/rtw_wlan_util.c | 7 +++ 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/drivers/staging/rtl8188eu/core/rtw_wlan_util.c b/drivers/staging/rtl8188eu/core/rtw_wlan_util.c index 24918223499b..acad30f53f5d 100644 --- a/drivers/staging/rtl8188eu/core/rtw_wlan_util.c +++ b/drivers/staging/rtl8188eu/core/rtw_wlan_util.c @@ -1440,10 +1440,9 @@ void update_wireless_mode(struct adapter *padapter) rtw_hal_set_hwreg(padapter, HW_VAR_RESP_SIFS, (u8 *)_Timer); - if (pmlmeext->cur_wireless_mode & WIRELESS_11B) - update_mgnt_tx_rate(padapter, IEEE80211_CCK_RATE_1MB); -else - update_mgnt_tx_rate(padapter, IEEE80211_OFDM_RATE_6MB); + update_mgnt_tx_rate(padapter, + pmlmeext->cur_wireless_mode & WIRELESS_11B ? + IEEE80211_CCK_RATE_1MB : IEEE80211_OFDM_RATE_6MB); } void update_bmc_sta_support_rate(struct adapter *padapter, u32 mac_id)
Re: [PATCH 10/12] staging: rtl8188eu: correct indentation in update_wireless_mode()
On Wed, 2018-12-05 at 18:02 +0100, Michael Straube wrote: > Correct indentation in update_wireless_mode() to clear a checkpatch > warning. WARNING: Statements should start on a tabstop [] > diff --git a/drivers/staging/rtl8188eu/core/rtw_wlan_util.c > b/drivers/staging/rtl8188eu/core/rtw_wlan_util.c [] > @@ -1442,7 +1442,7 @@ void update_wireless_mode(struct adapter *padapter) > > if (pmlmeext->cur_wireless_mode & WIRELESS_11B) > update_mgnt_tx_rate(padapter, IEEE80211_CCK_RATE_1MB); > - else > + else > update_mgnt_tx_rate(padapter, IEEE80211_OFDM_RATE_6MB); > } . gcc generally emits smaller code by using a single call and a ternary. $ size drivers/staging/rtl8188eu/core/rtw_wlan_util.o* textdata bss dec hex filename 14974 36 0 150103aa2 drivers/staging/rtl8188eu/core/rtw_wlan_util.o.new 14990 36 0 150263ab2 drivers/staging/rtl8188eu/core/rtw_wlan_util.o.old --- drivers/staging/rtl8188eu/core/rtw_wlan_util.c | 7 +++ 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/drivers/staging/rtl8188eu/core/rtw_wlan_util.c b/drivers/staging/rtl8188eu/core/rtw_wlan_util.c index 24918223499b..acad30f53f5d 100644 --- a/drivers/staging/rtl8188eu/core/rtw_wlan_util.c +++ b/drivers/staging/rtl8188eu/core/rtw_wlan_util.c @@ -1440,10 +1440,9 @@ void update_wireless_mode(struct adapter *padapter) rtw_hal_set_hwreg(padapter, HW_VAR_RESP_SIFS, (u8 *)_Timer); - if (pmlmeext->cur_wireless_mode & WIRELESS_11B) - update_mgnt_tx_rate(padapter, IEEE80211_CCK_RATE_1MB); -else - update_mgnt_tx_rate(padapter, IEEE80211_OFDM_RATE_6MB); + update_mgnt_tx_rate(padapter, + pmlmeext->cur_wireless_mode & WIRELESS_11B ? + IEEE80211_CCK_RATE_1MB : IEEE80211_OFDM_RATE_6MB); } void update_bmc_sta_support_rate(struct adapter *padapter, u32 mac_id)
Re: [PATCH 1/8] perf: Allow to block process in syscall tracepoints
On Wed, 5 Dec 2018 17:05:02 +0100 Jiri Olsa wrote: > diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c > index 3b2490b81918..e55cf9169a03 100644 > --- a/arch/x86/entry/common.c > +++ b/arch/x86/entry/common.c > @@ -60,6 +60,32 @@ static void do_audit_syscall_entry(struct pt_regs *regs, > u32 arch) > } > } > > +static void trace_block_syscall(struct pt_regs *regs, bool enter) > +{ > + current->perf_blocked = true; > + > + do { > + schedule_timeout(100 * HZ); > + current->perf_blocked_cnt = 0; > + > + if (enter) { > + /* perf syscalls:* enter */ > + perf_trace_syscall_enter(regs); > + > + /* perf raw_syscalls:* enter */ > + perf_trace_sys_enter(_sys_enter, regs, > regs->orig_ax); > + } else { > + /* perf syscalls:* enter */ > + perf_trace_syscall_exit(regs); > + > + /* perf raw_syscalls:* enter */ > + perf_trace_sys_exit(_sys_exit, regs, regs->ax); > + } > + } while (current->perf_blocked_cnt); I was thinking, if the process reading the perf buffer dies, how do we tell this task to continue on? -- Steve > + > + current->perf_blocked = false; > +} > +
Re: [PATCH 1/8] perf: Allow to block process in syscall tracepoints
On Wed, 5 Dec 2018 17:05:02 +0100 Jiri Olsa wrote: > diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c > index 3b2490b81918..e55cf9169a03 100644 > --- a/arch/x86/entry/common.c > +++ b/arch/x86/entry/common.c > @@ -60,6 +60,32 @@ static void do_audit_syscall_entry(struct pt_regs *regs, > u32 arch) > } > } > > +static void trace_block_syscall(struct pt_regs *regs, bool enter) > +{ > + current->perf_blocked = true; > + > + do { > + schedule_timeout(100 * HZ); > + current->perf_blocked_cnt = 0; > + > + if (enter) { > + /* perf syscalls:* enter */ > + perf_trace_syscall_enter(regs); > + > + /* perf raw_syscalls:* enter */ > + perf_trace_sys_enter(_sys_enter, regs, > regs->orig_ax); > + } else { > + /* perf syscalls:* enter */ > + perf_trace_syscall_exit(regs); > + > + /* perf raw_syscalls:* enter */ > + perf_trace_sys_exit(_sys_exit, regs, regs->ax); > + } > + } while (current->perf_blocked_cnt); I was thinking, if the process reading the perf buffer dies, how do we tell this task to continue on? -- Steve > + > + current->perf_blocked = false; > +} > +
Re: [PATCH] perf, tools: Support srccode output
On Wed, Dec 05, 2018 at 01:28:38PM +0100, Jiri Olsa wrote: > On Mon, Dec 03, 2018 at 04:18:48PM -0800, Andi Kleen wrote: > > From: Andi Kleen > > > > When looking at PT or brstackinsn traces with perf script > > it can be very useful to see the source code. This adds a simple > > facility to print them with perf script, if the information > > is available through dwarf > > > > % perf record ... > > % perf script -F insn,ip,sym,srccode > > ... > > > > 4004c6 main > > 5 for (i = 0; i < 1000; i++) > >4004cd main > > 5 for (i = 0; i < 1000; i++) > >4004c6 main > > 5 for (i = 0; i < 1000; i++) > >4004cd main > > 5 for (i = 0; i < 1000; i++) > >4004cd main > > 5 for (i = 0; i < 1000; i++) > >4004cd main > > 5 for (i = 0; i < 1000; i++) > >4004cd main > > 5 for (i = 0; i < 1000; i++) > >4004cd main > > 5 for (i = 0; i < 1000; i++) > >4004b3 main > > 6 v++; > > > > % perf record -b ... > > % perf script -F insn,ip,sym,srccode,brstackinsn > > > > ... > >main+22: > > 00400543insn: e8 ca ff ff ff# PRED > > |18 f1(); > > f1: > > 00400512insn: 55 > > |10 { > > 00400513insn: 48 89 e5 > > 00400516insn: b8 00 00 00 00 > > |11 f2(); > > 0040051binsn: e8 d6 ff ff ff# PRED > > f2: > > 004004f6insn: 55 > > |5{ > > 004004f7insn: 48 89 e5 > > 004004fainsn: 8b 05 2c 0b 20 00 > > |6 c = a / b; > > 00400500insn: 8b 0d 2a 0b 20 00 > > 00400506insn: 99 > > 00400507insn: f7 f9 > > 00400509insn: 89 05 29 0b 20 00 > > 0040050finsn: 90 > > |7} > > 00400510insn: 5d > > 00400511insn: c3# PRED > > f1+14: > > 00400520insn: b8 00 00 00 00 > > |12 f2(); > > 00400525insn: e8 cc ff ff ff# PRED > > f2: > > 004004f6insn: 55 > > |5{ > > 004004f7insn: 48 89 e5 > > 004004fainsn: 8b 05 2c 0b 20 00 > > |6 c = a / b; > > > > Not supported for callchains currently, would need some > > layout changes there. > > nice, works nicely, especialy with --xed > > Acked-by: Jiri Olsa > > I need to check, but is there a way to make this work with > perf-with-kcore script? looks like we endup with kcore and > won't allow vmlinux lookups for debug info Hmm, probably not currently. Would somehow need to point to the debuginfo. Perhaps if /proc/kcore was fixed to report the debugid, and then could find it this way. Or we could add an option to perf report to add a vmlinux only for debug info that is not used for reading the code. -andi
Re: [PATCH] perf, tools: Support srccode output
On Wed, Dec 05, 2018 at 01:28:38PM +0100, Jiri Olsa wrote: > On Mon, Dec 03, 2018 at 04:18:48PM -0800, Andi Kleen wrote: > > From: Andi Kleen > > > > When looking at PT or brstackinsn traces with perf script > > it can be very useful to see the source code. This adds a simple > > facility to print them with perf script, if the information > > is available through dwarf > > > > % perf record ... > > % perf script -F insn,ip,sym,srccode > > ... > > > > 4004c6 main > > 5 for (i = 0; i < 1000; i++) > >4004cd main > > 5 for (i = 0; i < 1000; i++) > >4004c6 main > > 5 for (i = 0; i < 1000; i++) > >4004cd main > > 5 for (i = 0; i < 1000; i++) > >4004cd main > > 5 for (i = 0; i < 1000; i++) > >4004cd main > > 5 for (i = 0; i < 1000; i++) > >4004cd main > > 5 for (i = 0; i < 1000; i++) > >4004cd main > > 5 for (i = 0; i < 1000; i++) > >4004b3 main > > 6 v++; > > > > % perf record -b ... > > % perf script -F insn,ip,sym,srccode,brstackinsn > > > > ... > >main+22: > > 00400543insn: e8 ca ff ff ff# PRED > > |18 f1(); > > f1: > > 00400512insn: 55 > > |10 { > > 00400513insn: 48 89 e5 > > 00400516insn: b8 00 00 00 00 > > |11 f2(); > > 0040051binsn: e8 d6 ff ff ff# PRED > > f2: > > 004004f6insn: 55 > > |5{ > > 004004f7insn: 48 89 e5 > > 004004fainsn: 8b 05 2c 0b 20 00 > > |6 c = a / b; > > 00400500insn: 8b 0d 2a 0b 20 00 > > 00400506insn: 99 > > 00400507insn: f7 f9 > > 00400509insn: 89 05 29 0b 20 00 > > 0040050finsn: 90 > > |7} > > 00400510insn: 5d > > 00400511insn: c3# PRED > > f1+14: > > 00400520insn: b8 00 00 00 00 > > |12 f2(); > > 00400525insn: e8 cc ff ff ff# PRED > > f2: > > 004004f6insn: 55 > > |5{ > > 004004f7insn: 48 89 e5 > > 004004fainsn: 8b 05 2c 0b 20 00 > > |6 c = a / b; > > > > Not supported for callchains currently, would need some > > layout changes there. > > nice, works nicely, especialy with --xed > > Acked-by: Jiri Olsa > > I need to check, but is there a way to make this work with > perf-with-kcore script? looks like we endup with kcore and > won't allow vmlinux lookups for debug info Hmm, probably not currently. Would somehow need to point to the debuginfo. Perhaps if /proc/kcore was fixed to report the debugid, and then could find it this way. Or we could add an option to perf report to add a vmlinux only for debug info that is not used for reading the code. -andi
Re: [RFC PATCH 02/14] mm/hms: heterogenenous memory system (HMS) documentation
On 2018-12-04 7:37 p.m., Jerome Glisse wrote: >> >> This came up before for apis even better defined than HMS as well as >> more limited scope, i.e. experimental ABI availability only for -rc >> kernels. Linus said this: >> >> "There are no loopholes. No "but it's been only one release". No, no, >> no. The whole point is that users are supposed to be able to *trust* >> the kernel. If we do something, we keep on doing it. >> >> And if it makes it harder to add new user-visible interfaces, then >> that's a *good* thing." [1] >> >> The takeaway being don't land work-in-progress ABIs in the kernel. >> Once an application depends on it, there are no more incompatible >> changes possible regardless of the warnings, experimental notices, or >> "staging" designation. DAX is experimental because there are cases >> where it currently does not work with respect to another kernel >> feature like xfs-reflink, RDMA. The plan is to fix those, not continue >> to hide behind an experimental designation, and fix them in a way that >> preserves the user visible behavior that has already been exposed, >> i.e. no regressions. >> >> [1]: >> https://lists.linuxfoundation.org/pipermail/ksummit-discuss/2017-August/004742.html > > So i guess i am heading down the vXX road ... such is my life :) I recommend against it. I really haven't been convinced by any of your arguments for having a second topology tree. The existing topology tree in sysfs already better describes the links between hardware right now, except for the missing GPU links (and those should be addressable within the GPU community). Plus, maybe, some other enhancements to sockets/numa node descriptions if there's something missing there. Then, 'hbind' is another issue but I suspect it would be better implemented as an ioctl on existing GPU interfaces. I certainly can't see any benefit in using it myself. It's better to take an approach that would be less controversial with the community than to brow beat them with a patch set 20+ times until they take it. Logan
Re: [RFC PATCH 02/14] mm/hms: heterogenenous memory system (HMS) documentation
On 2018-12-04 7:37 p.m., Jerome Glisse wrote: >> >> This came up before for apis even better defined than HMS as well as >> more limited scope, i.e. experimental ABI availability only for -rc >> kernels. Linus said this: >> >> "There are no loopholes. No "but it's been only one release". No, no, >> no. The whole point is that users are supposed to be able to *trust* >> the kernel. If we do something, we keep on doing it. >> >> And if it makes it harder to add new user-visible interfaces, then >> that's a *good* thing." [1] >> >> The takeaway being don't land work-in-progress ABIs in the kernel. >> Once an application depends on it, there are no more incompatible >> changes possible regardless of the warnings, experimental notices, or >> "staging" designation. DAX is experimental because there are cases >> where it currently does not work with respect to another kernel >> feature like xfs-reflink, RDMA. The plan is to fix those, not continue >> to hide behind an experimental designation, and fix them in a way that >> preserves the user visible behavior that has already been exposed, >> i.e. no regressions. >> >> [1]: >> https://lists.linuxfoundation.org/pipermail/ksummit-discuss/2017-August/004742.html > > So i guess i am heading down the vXX road ... such is my life :) I recommend against it. I really haven't been convinced by any of your arguments for having a second topology tree. The existing topology tree in sysfs already better describes the links between hardware right now, except for the missing GPU links (and those should be addressable within the GPU community). Plus, maybe, some other enhancements to sockets/numa node descriptions if there's something missing there. Then, 'hbind' is another issue but I suspect it would be better implemented as an ioctl on existing GPU interfaces. I certainly can't see any benefit in using it myself. It's better to take an approach that would be less controversial with the community than to brow beat them with a patch set 20+ times until they take it. Logan
Re: [PATCH 4.19 000/139] 4.19.7-stable review
On Wed, Dec 05, 2018 at 02:44:59PM -0200, Rafael David Tinoco wrote: > On 12/4/18 8:48 AM, Greg Kroah-Hartman wrote: > > This is the start of the stable review cycle for the 4.19.7 release. > > There are 139 patches in this series, all will be posted as a response > > to this one. If anyone has any issues with these being applied, please > > let me know. > > > > Responses should be made by Thu Dec 6 10:36:22 UTC 2018. > > Anything received after that time might be too late. > > > > The whole patch series can be found in one patch at: > > > > https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.19.7-rc1.gz > > or in the git tree and branch at: > > > > git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git > > linux-4.19.y > > and the diffstat can be found below. > > > > thanks, > > > > greg k-h > > > Note: As a consequence of the retpoline backport, we upgraded our > toolchain that is used to build the kernel and userspace from 7.0 to > 7.3 (containing retpoline support). Specifically, we went from Linaro's gcc version 7.1.1 20170707 (Linaro GCC 7.1-2017.08) to upstream's gcc version 7.3.0 (GCC). > > Results from Linaro’s test farm. > No regressions on arm64, arm, x86_64, and i386. > > Summary > > > kernel: 4.19.7-rc1 > git repo: > https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git > git branch: linux-4.19.y > git commit: 987a6da5152c29e37cc11de9a2d10a23a48015c9 > git describe: v4.19.6-140-g987a6da5152c > Test details: > https://qa-reports.linaro.org/lkft/linux-stable-rc-4.19-oe/build/v4.19.6-140-g987a6da5152c > > > No Regressions (compared to build v4.19.6) > > > No Fixes (compared to build v4.19.6) > > > Ran 20285 total tests in the following environments and test suites. > > Environments > -- > - dragonboard-410c - arm64 > - hi6220-hikey - arm64 > - i386 > - juno-r2 - arm64 > - qemu_arm > - qemu_arm64 > - qemu_i386 > - qemu_x86_64 > - x15 - arm > - x86_64 > > Test Suites > --- > * boot > * install-android-platform-tools-r2600 > * kselftest > * libhugetlbfs > * ltp-cap_bounds-tests > * ltp-containers-tests > * ltp-cve-tests > * ltp-fcntl-locktests-tests > * ltp-filecaps-tests > * ltp-fs-tests > * ltp-fs_bind-tests > * ltp-fs_perms_simple-tests > * ltp-fsx-tests > * ltp-hugetlb-tests > * ltp-io-tests > * ltp-ipc-tests > * ltp-math-tests > * ltp-nptl-tests > * ltp-pty-tests > * ltp-sched-tests > * ltp-securebits-tests > * ltp-syscalls-tests > * ltp-timers-tests > * ltp-open-posix-tests > * kselftest-vsyscall-mode-native > * kselftest-vsyscall-mode-none > > -- > Linaro LKFT > https://lkft.linaro.org
Re: [PATCH 4.19 000/139] 4.19.7-stable review
On Wed, Dec 05, 2018 at 02:44:59PM -0200, Rafael David Tinoco wrote: > On 12/4/18 8:48 AM, Greg Kroah-Hartman wrote: > > This is the start of the stable review cycle for the 4.19.7 release. > > There are 139 patches in this series, all will be posted as a response > > to this one. If anyone has any issues with these being applied, please > > let me know. > > > > Responses should be made by Thu Dec 6 10:36:22 UTC 2018. > > Anything received after that time might be too late. > > > > The whole patch series can be found in one patch at: > > > > https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.19.7-rc1.gz > > or in the git tree and branch at: > > > > git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git > > linux-4.19.y > > and the diffstat can be found below. > > > > thanks, > > > > greg k-h > > > Note: As a consequence of the retpoline backport, we upgraded our > toolchain that is used to build the kernel and userspace from 7.0 to > 7.3 (containing retpoline support). Specifically, we went from Linaro's gcc version 7.1.1 20170707 (Linaro GCC 7.1-2017.08) to upstream's gcc version 7.3.0 (GCC). > > Results from Linaro’s test farm. > No regressions on arm64, arm, x86_64, and i386. > > Summary > > > kernel: 4.19.7-rc1 > git repo: > https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git > git branch: linux-4.19.y > git commit: 987a6da5152c29e37cc11de9a2d10a23a48015c9 > git describe: v4.19.6-140-g987a6da5152c > Test details: > https://qa-reports.linaro.org/lkft/linux-stable-rc-4.19-oe/build/v4.19.6-140-g987a6da5152c > > > No Regressions (compared to build v4.19.6) > > > No Fixes (compared to build v4.19.6) > > > Ran 20285 total tests in the following environments and test suites. > > Environments > -- > - dragonboard-410c - arm64 > - hi6220-hikey - arm64 > - i386 > - juno-r2 - arm64 > - qemu_arm > - qemu_arm64 > - qemu_i386 > - qemu_x86_64 > - x15 - arm > - x86_64 > > Test Suites > --- > * boot > * install-android-platform-tools-r2600 > * kselftest > * libhugetlbfs > * ltp-cap_bounds-tests > * ltp-containers-tests > * ltp-cve-tests > * ltp-fcntl-locktests-tests > * ltp-filecaps-tests > * ltp-fs-tests > * ltp-fs_bind-tests > * ltp-fs_perms_simple-tests > * ltp-fsx-tests > * ltp-hugetlb-tests > * ltp-io-tests > * ltp-ipc-tests > * ltp-math-tests > * ltp-nptl-tests > * ltp-pty-tests > * ltp-sched-tests > * ltp-securebits-tests > * ltp-syscalls-tests > * ltp-timers-tests > * ltp-open-posix-tests > * kselftest-vsyscall-mode-native > * kselftest-vsyscall-mode-none > > -- > Linaro LKFT > https://lkft.linaro.org
Re: [PATCH v3] signal: add procfd_send_signal() syscall
Hi Christian, Thank you for the patch! Perhaps something to improve: [auto build test WARNING on linus/master] [also build test WARNING on v4.20-rc5] [cannot apply to next-20181205] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Christian-Brauner/signal-add-procfd_send_signal-syscall/20181205-174512 config: sh-titan_defconfig (attached as .config) compiler: sh4-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0 reproduce: wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # save the attached .config to linux build tree GCC_VERSION=7.2.0 make.cross ARCH=sh All warnings (new ones prefixed by >>): :1317:2: warning: #warning syscall pkey_mprotect not implemented [-Wcpp] :1320:2: warning: #warning syscall pkey_alloc not implemented [-Wcpp] :1323:2: warning: #warning syscall pkey_free not implemented [-Wcpp] :1326:2: warning: #warning syscall statx not implemented [-Wcpp] :1332:2: warning: #warning syscall io_pgetevents not implemented [-Wcpp] :1335:2: warning: #warning syscall rseq not implemented [-Wcpp] >> :1338:2: warning: #warning syscall procfd_send_signal not implemented >> [-Wcpp] --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation .config.gz Description: application/gzip
Re: [PATCH v3] signal: add procfd_send_signal() syscall
Hi Christian, Thank you for the patch! Perhaps something to improve: [auto build test WARNING on linus/master] [also build test WARNING on v4.20-rc5] [cannot apply to next-20181205] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Christian-Brauner/signal-add-procfd_send_signal-syscall/20181205-174512 config: sh-titan_defconfig (attached as .config) compiler: sh4-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0 reproduce: wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # save the attached .config to linux build tree GCC_VERSION=7.2.0 make.cross ARCH=sh All warnings (new ones prefixed by >>): :1317:2: warning: #warning syscall pkey_mprotect not implemented [-Wcpp] :1320:2: warning: #warning syscall pkey_alloc not implemented [-Wcpp] :1323:2: warning: #warning syscall pkey_free not implemented [-Wcpp] :1326:2: warning: #warning syscall statx not implemented [-Wcpp] :1332:2: warning: #warning syscall io_pgetevents not implemented [-Wcpp] :1335:2: warning: #warning syscall rseq not implemented [-Wcpp] >> :1338:2: warning: #warning syscall procfd_send_signal not implemented >> [-Wcpp] --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation .config.gz Description: application/gzip
Re: [PATCH V2 5/5] PM / Domains: Propagate performance state updates
On Wed, 5 Dec 2018 at 07:42, Stephen Boyd wrote: > > Quoting Ulf Hansson (2018-12-03 05:38:35) > > + Stephen, Mike, Graham > > > > On Fri, 30 Nov 2018 at 12:06, Viresh Kumar wrote: > > > > > > On 30-11-18, 11:18, Ulf Hansson wrote: > > There is one a big difference while comparing with clocks, which make > > this more difficult. > > > > That is, in dev_pm_genpd_set_performance_state(), we are *not* calling > > ->the set_performance_state() callback of the genpd, unless the genpd > > is already powered on. Instead, for that case, we are only aggregating > > the performance states votes, to defer to invoke > > ->set_performance_state() until the genpd becomes powered on. In some > > way this makes sense, but for clock_set_rate(), the clock's rate can > > be changed, no matter if the clock has been prepared/enabled or not. > > > > I recall we discussed this behavior of genpd, while introducing the > > performance states support to it. Reaching this point, introducing the > > master-domain propagation of performance states votes, we may need to > > re-consider the behavior, as there is evidently an overhead that grows > > along with the hierarchy. > > > > As a matter of fact, what I think this boils to, is to consider if we > > want to temporary drop the performance state vote for a device from > > genpd's ->runtime_suspend() callback. Thus, also restore the vote from > > genpd's ->runtime_resume() callback. That's because, this is directly > > related to whether genpd should care about whether it's powered on or > > off, when calling the ->set_performance_state(). We have had > > discussions at LKML already around this topic. It seems like we need > > to pick them up to reach a consensus, before we can move forward with > > this. > > What are we trying to gain consensus on exactly? Apologize for not being more clear. Let me re-phrase the question. Should genpd (via genpd's internal runtime suspend callback) reset the performance state for a device to zero, when that device becomes runtime suspended? Of course, if such reset is done, then genpd should also re-compute the aggregation of the performance states for the corresponding PM domain (and its master domains). Consequentially, these operations then needs to be reversed, when the device in question becomes runtime resumed. > > I've been thinking that we would want there to be a performance state > 'target' for the 'devices are runtime suspended' state that we apply > when the genpd is powered down from runtime PM suspend of all devices in > the domain. This would then be overwritten with whatever the aggregated > performance state is when the genpd is powered back on due to some > device runtime resuming. >From the above, I assume "target" to not always be zero. That confirms my understanding from earlier discussions. That is sort of what we already have today. The consumer driver are free to change the performance state, via dev_pm_genpd_set_performance_state() (or indirectly via dev_pm_opp_set_rate()), for its device at any time. For example, it may want to set a new performance state at runtime suspend. > Don't we already sort of have support for this > right now in genpd with commit fc5cbf0c94b6 ("PM / Domains: Support for > multiple states")? So I would think that we either let that state_idx > member be 0 or some implementation defined 'off' state index of the > genpd that can be achieved. Well, implementation details, but thanks for sharing your ideas. > I was also thinking that the genpd_power_state was more than just a > bunch of idle states of a device so that we could combine > on/off/idle/active(performance states?) into one genpd_power_state > space that is affected by idle and on/off runtime PM status. > > In the Qualcomm use-case, the state of the clk, either on or off, is > factored into the decision to consider the voltage constraint that the > clk has on the CX domain. So when the clk is disabled, the voltage > constraint on CX can be removed/ignored in the aggregation equation > because the hardware isn't clocking. This needs to work properly so that > devices can disable their clks and save power. > > I hope that we can move clk on/off/rate control to be implemented with > clk domains based upon genpds so that we can generically reason about > genpds being on/off and at some performance state (i.e. a frequency) > instead of making something clk specific in the genpd code. If we can do > that, then this can be stated as a slave genpd (the clk domain) that's > linked to a master genpd (the voltage/corner domain) can only affect the > performance state requirement of the master genpd when the slave genpd > is on. When the slave genpd is off, the performance state requirement is > dropped and the master domain settles to a lower requirement based on > the other domains linked to it that are on. The clk domain would turn on > and off when the set of devices it is attached to are all suspended and > that would "do the right thing"
Re: [PATCH V2 5/5] PM / Domains: Propagate performance state updates
On Wed, 5 Dec 2018 at 07:42, Stephen Boyd wrote: > > Quoting Ulf Hansson (2018-12-03 05:38:35) > > + Stephen, Mike, Graham > > > > On Fri, 30 Nov 2018 at 12:06, Viresh Kumar wrote: > > > > > > On 30-11-18, 11:18, Ulf Hansson wrote: > > There is one a big difference while comparing with clocks, which make > > this more difficult. > > > > That is, in dev_pm_genpd_set_performance_state(), we are *not* calling > > ->the set_performance_state() callback of the genpd, unless the genpd > > is already powered on. Instead, for that case, we are only aggregating > > the performance states votes, to defer to invoke > > ->set_performance_state() until the genpd becomes powered on. In some > > way this makes sense, but for clock_set_rate(), the clock's rate can > > be changed, no matter if the clock has been prepared/enabled or not. > > > > I recall we discussed this behavior of genpd, while introducing the > > performance states support to it. Reaching this point, introducing the > > master-domain propagation of performance states votes, we may need to > > re-consider the behavior, as there is evidently an overhead that grows > > along with the hierarchy. > > > > As a matter of fact, what I think this boils to, is to consider if we > > want to temporary drop the performance state vote for a device from > > genpd's ->runtime_suspend() callback. Thus, also restore the vote from > > genpd's ->runtime_resume() callback. That's because, this is directly > > related to whether genpd should care about whether it's powered on or > > off, when calling the ->set_performance_state(). We have had > > discussions at LKML already around this topic. It seems like we need > > to pick them up to reach a consensus, before we can move forward with > > this. > > What are we trying to gain consensus on exactly? Apologize for not being more clear. Let me re-phrase the question. Should genpd (via genpd's internal runtime suspend callback) reset the performance state for a device to zero, when that device becomes runtime suspended? Of course, if such reset is done, then genpd should also re-compute the aggregation of the performance states for the corresponding PM domain (and its master domains). Consequentially, these operations then needs to be reversed, when the device in question becomes runtime resumed. > > I've been thinking that we would want there to be a performance state > 'target' for the 'devices are runtime suspended' state that we apply > when the genpd is powered down from runtime PM suspend of all devices in > the domain. This would then be overwritten with whatever the aggregated > performance state is when the genpd is powered back on due to some > device runtime resuming. >From the above, I assume "target" to not always be zero. That confirms my understanding from earlier discussions. That is sort of what we already have today. The consumer driver are free to change the performance state, via dev_pm_genpd_set_performance_state() (or indirectly via dev_pm_opp_set_rate()), for its device at any time. For example, it may want to set a new performance state at runtime suspend. > Don't we already sort of have support for this > right now in genpd with commit fc5cbf0c94b6 ("PM / Domains: Support for > multiple states")? So I would think that we either let that state_idx > member be 0 or some implementation defined 'off' state index of the > genpd that can be achieved. Well, implementation details, but thanks for sharing your ideas. > I was also thinking that the genpd_power_state was more than just a > bunch of idle states of a device so that we could combine > on/off/idle/active(performance states?) into one genpd_power_state > space that is affected by idle and on/off runtime PM status. > > In the Qualcomm use-case, the state of the clk, either on or off, is > factored into the decision to consider the voltage constraint that the > clk has on the CX domain. So when the clk is disabled, the voltage > constraint on CX can be removed/ignored in the aggregation equation > because the hardware isn't clocking. This needs to work properly so that > devices can disable their clks and save power. > > I hope that we can move clk on/off/rate control to be implemented with > clk domains based upon genpds so that we can generically reason about > genpds being on/off and at some performance state (i.e. a frequency) > instead of making something clk specific in the genpd code. If we can do > that, then this can be stated as a slave genpd (the clk domain) that's > linked to a master genpd (the voltage/corner domain) can only affect the > performance state requirement of the master genpd when the slave genpd > is on. When the slave genpd is off, the performance state requirement is > dropped and the master domain settles to a lower requirement based on > the other domains linked to it that are on. The clk domain would turn on > and off when the set of devices it is attached to are all suspended and > that would "do the right thing"
Re: [PATCH] power: supply: cpcap-battery: make array cpcap_battery_irqs static, shrinks object size
Hi, On Thu, Nov 29, 2018 at 11:03:42PM +, Colin King wrote: > From: Colin Ian King > > Don't populate the array cpcap_battery_irqs on the stack but instead > make it static. Makes the object code smaller by 99 bytes: > > Before: >text data bss dec hex filename > 13673 2448 0 161213ef9 cpcap-battery.o > > After: >text data bss dec hex filename > 13510 2512 0 160223e96 cpcap-battery.o > > (gcc version 8.2.0 x86_64) > > Signed-off-by: Colin Ian King > --- Thanks, queued. -- Sebastian > drivers/power/supply/cpcap-battery.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/power/supply/cpcap-battery.c > b/drivers/power/supply/cpcap-battery.c > index 98ba07869c3b..204e22e3e9b4 100644 > --- a/drivers/power/supply/cpcap-battery.c > +++ b/drivers/power/supply/cpcap-battery.c > @@ -620,7 +620,7 @@ static int cpcap_battery_init_irq(struct platform_device > *pdev, > static int cpcap_battery_init_interrupts(struct platform_device *pdev, >struct cpcap_battery_ddata *ddata) > { > - const char * const cpcap_battery_irqs[] = { > + static const char * const cpcap_battery_irqs[] = { > "eol", "lowbph", "lowbpl", > "chrgcurr1", "battdetb" > }; > -- > 2.19.1 > signature.asc Description: PGP signature
Re: [PATCH] power: supply: cpcap-battery: make array cpcap_battery_irqs static, shrinks object size
Hi, On Thu, Nov 29, 2018 at 11:03:42PM +, Colin King wrote: > From: Colin Ian King > > Don't populate the array cpcap_battery_irqs on the stack but instead > make it static. Makes the object code smaller by 99 bytes: > > Before: >text data bss dec hex filename > 13673 2448 0 161213ef9 cpcap-battery.o > > After: >text data bss dec hex filename > 13510 2512 0 160223e96 cpcap-battery.o > > (gcc version 8.2.0 x86_64) > > Signed-off-by: Colin Ian King > --- Thanks, queued. -- Sebastian > drivers/power/supply/cpcap-battery.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/power/supply/cpcap-battery.c > b/drivers/power/supply/cpcap-battery.c > index 98ba07869c3b..204e22e3e9b4 100644 > --- a/drivers/power/supply/cpcap-battery.c > +++ b/drivers/power/supply/cpcap-battery.c > @@ -620,7 +620,7 @@ static int cpcap_battery_init_irq(struct platform_device > *pdev, > static int cpcap_battery_init_interrupts(struct platform_device *pdev, >struct cpcap_battery_ddata *ddata) > { > - const char * const cpcap_battery_irqs[] = { > + static const char * const cpcap_battery_irqs[] = { > "eol", "lowbph", "lowbpl", > "chrgcurr1", "battdetb" > }; > -- > 2.19.1 > signature.asc Description: PGP signature
[PATCH] ARM: dts: omap5: Fix dual-role mode on Super-Speed port
OMAP5's Super-Speed USB port has a software mailbox register that needs to be fed with VBUS and ID events from an external VBUS/ID comparator. Without this, Host role will not work correctly. Fixes: 656c1a65ab55 ("ARM: dts: omap5: enable OTG role for DWC3 controller") Reported-by: H. Nikolaus Schaller Signed-off-by: Roger Quadros --- Tony, Please included this in v4.20-rc cycle. Thanks. cheers, -roger arch/arm/boot/dts/omap5-board-common.dtsi | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/arm/boot/dts/omap5-board-common.dtsi b/arch/arm/boot/dts/omap5-board-common.dtsi index bf7ca00..bc853eb 100644 --- a/arch/arm/boot/dts/omap5-board-common.dtsi +++ b/arch/arm/boot/dts/omap5-board-common.dtsi @@ -701,6 +701,7 @@ }; { + extcon = <_usb3>; dr_mode = "otg"; }; -- Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki. Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki
[PATCH] ARM: dts: omap5: Fix dual-role mode on Super-Speed port
OMAP5's Super-Speed USB port has a software mailbox register that needs to be fed with VBUS and ID events from an external VBUS/ID comparator. Without this, Host role will not work correctly. Fixes: 656c1a65ab55 ("ARM: dts: omap5: enable OTG role for DWC3 controller") Reported-by: H. Nikolaus Schaller Signed-off-by: Roger Quadros --- Tony, Please included this in v4.20-rc cycle. Thanks. cheers, -roger arch/arm/boot/dts/omap5-board-common.dtsi | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/arm/boot/dts/omap5-board-common.dtsi b/arch/arm/boot/dts/omap5-board-common.dtsi index bf7ca00..bc853eb 100644 --- a/arch/arm/boot/dts/omap5-board-common.dtsi +++ b/arch/arm/boot/dts/omap5-board-common.dtsi @@ -701,6 +701,7 @@ }; { + extcon = <_usb3>; dr_mode = "otg"; }; -- Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki. Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki
Re: [RFC PATCH 00/14] Heterogeneous Memory System (HMS) and hbind()
On 12/4/18 6:13 PM, Jerome Glisse wrote: > On Tue, Dec 04, 2018 at 05:06:49PM -0800, Dave Hansen wrote: >> OK, but there are 1024*1024 matrix cells on a systems with 1024 >> proximity domains (ACPI term for NUMA node). So it sounds like you are >> proposing a million-directory approach. > > No, pseudo code: > struct list links; > > for (unsigned r = 0; r < nrows; r++) { > for (unsigned c = 0; c < ncolumns; c++) { > if (!link_find(links, hmat[r][c].bandwidth, >hmat[r][c].latency)) { > link = link_new(hmat[r][c].bandwidth, > hmat[r][c].latency); > // add initiator and target correspond to that row > // and columns to this new link > list_add(, links); > } > } > } > > So all cells that have same property are under the same link. OK, so the "link" here is like a cable. It's like saying, "we have a network and everything is connected with an ethernet cable that can do 1gbit/sec". But, what actually connects an initiator to a target? I assume we still need to know which link is used for each target/initiator pair. Where is that enumerated? I think this just means we need a million symlinks to a "link" instead of a million link directories. Still not great. > Note that userspace can parse all this once during its initialization > and create pools of target to use. It sounds like you're agreeing that there is too much data in this interface for applications to _regularly_ parse it. We need some central thing that parses it all and caches the results.
Re: [RFC PATCH 00/14] Heterogeneous Memory System (HMS) and hbind()
On 12/4/18 6:13 PM, Jerome Glisse wrote: > On Tue, Dec 04, 2018 at 05:06:49PM -0800, Dave Hansen wrote: >> OK, but there are 1024*1024 matrix cells on a systems with 1024 >> proximity domains (ACPI term for NUMA node). So it sounds like you are >> proposing a million-directory approach. > > No, pseudo code: > struct list links; > > for (unsigned r = 0; r < nrows; r++) { > for (unsigned c = 0; c < ncolumns; c++) { > if (!link_find(links, hmat[r][c].bandwidth, >hmat[r][c].latency)) { > link = link_new(hmat[r][c].bandwidth, > hmat[r][c].latency); > // add initiator and target correspond to that row > // and columns to this new link > list_add(, links); > } > } > } > > So all cells that have same property are under the same link. OK, so the "link" here is like a cable. It's like saying, "we have a network and everything is connected with an ethernet cable that can do 1gbit/sec". But, what actually connects an initiator to a target? I assume we still need to know which link is used for each target/initiator pair. Where is that enumerated? I think this just means we need a million symlinks to a "link" instead of a million link directories. Still not great. > Note that userspace can parse all this once during its initialization > and create pools of target to use. It sounds like you're agreeing that there is too much data in this interface for applications to _regularly_ parse it. We need some central thing that parses it all and caches the results.