Re: [PATCH v2 5/5] powerpc: Remove -mno-sched-epilog
On Fri, Sep 14, 2018 at 03:20:18PM -0700, Nick Desaulniers wrote: > On Fri, Sep 14, 2018 at 2:56 PM Segher Boessenkool > wrote: > > On Sat, Sep 15, 2018 at 06:43:05AM +1000, Nicholas Piggin wrote: > > > On Fri, 14 Sep 2018 11:03:38 -0700 > > > Nick Desaulniers wrote: > > > > > > Sorry I forgot to cc you. This has links to some of the sched > > > epilog bugs. > > > > > > https://marc.info/?l=linuxppc-embedded=153690223909654=2 > > Cool, cc me on the thread if you'd like me to add my reviewed-by tag > visibly on the thread. > > > PR44199 was backported to 4.4 and PR52828 is fixed in 4.8. > > Are those GCC versions? If so, what does that mean for the users of > the many GCC releases between 4.4 and 4.8? Yes, GCC bug 44199 was fixed in GCC version 4.4, etc. It of course also is fixed in all later versions; so GCC releases between 4.4 and 4.8 have the fix for PR44199 but not that for PR52828. I didn't check exactly what 4.4.x versions have the fix, etc. I always assume anyone using x.y.z uses the highest z available. I don't know if those are the only fixes you need; those are the two bugs mentioned in Nicholas' patch (that MARC link above). Segher
[PATCH 3/3] powerpc: uapi header and system call table file generation
System call table generation script must be run to generate unistd_32/64.h and syscall_table_32/64/c32.h files. This patch will have changes which will invokes the script. This patch will generate unistd_32/64.h and syscall_table_ 32/64/c32.h files by the syscall table generation script invoked by arch/sparc/Makefile and the generated files against the removed files will be identical. The generated uapi header file will be included in uapi/asm/ unistd_32/64.h and generated system call table support file will be included by arch/sparc/kernel/syscall_table_32/64.S file. Signed-off-by: Firoz Khan --- arch/powerpc/Makefile | 3 + arch/powerpc/include/asm/Kbuild| 3 + arch/powerpc/include/uapi/asm/Kbuild | 2 + arch/powerpc/include/uapi/asm/unistd.h | 393 + arch/powerpc/kernel/Makefile | 3 +- arch/powerpc/kernel/syscall_table_32.S | 9 + arch/powerpc/kernel/syscall_table_64.S | 17 ++ arch/powerpc/kernel/systbl.S | 50 - 8 files changed, 39 insertions(+), 441 deletions(-) create mode 100644 arch/powerpc/kernel/syscall_table_32.S create mode 100644 arch/powerpc/kernel/syscall_table_64.S delete mode 100644 arch/powerpc/kernel/systbl.S diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile index 11a1acb..90614c9 100644 --- a/arch/powerpc/Makefile +++ b/arch/powerpc/Makefile @@ -400,6 +400,9 @@ archclean: archprepare: checkbin +archheaders: + $(Q)$(MAKE) $(build)=arch/powerpc/kernel/syscalls all + # Use the file '.tmp_gas_check' for binutils tests, as gas won't output # to stdout and these checks are run even on install targets. TOUT := .tmp_gas_check diff --git a/arch/powerpc/include/asm/Kbuild b/arch/powerpc/include/asm/Kbuild index 3196d22..74e63b4 100644 --- a/arch/powerpc/include/asm/Kbuild +++ b/arch/powerpc/include/asm/Kbuild @@ -8,3 +8,6 @@ generic-y += preempt.h generic-y += rwsem.h generic-y += vtime.h generic-y += msi.h +generated-y += syscall_table_32.h +generated-y += syscall_table_64.h +generated-y += syscall_table_c32.h \ No newline at end of file diff --git a/arch/powerpc/include/uapi/asm/Kbuild b/arch/powerpc/include/uapi/asm/Kbuild index 1a6ed59..a731c5b 100644 --- a/arch/powerpc/include/uapi/asm/Kbuild +++ b/arch/powerpc/include/uapi/asm/Kbuild @@ -7,3 +7,5 @@ generic-y += poll.h generic-y += resource.h generic-y += sockios.h generic-y += statfs.h +generated-y += unistd_32.h +generated-y += unistd_64.h \ No newline at end of file diff --git a/arch/powerpc/include/uapi/asm/unistd.h b/arch/powerpc/include/uapi/asm/unistd.h index f999df2..9084a0c 100644 --- a/arch/powerpc/include/uapi/asm/unistd.h +++ b/arch/powerpc/include/uapi/asm/unistd.h @@ -10,397 +10,10 @@ #ifndef _UAPI_ASM_POWERPC_UNISTD_H_ #define _UAPI_ASM_POWERPC_UNISTD_H_ - -#define __NR_restart_syscall 0 -#define __NR_exit1 -#define __NR_fork2 -#define __NR_read3 -#define __NR_write 4 -#define __NR_open5 -#define __NR_close 6 -#define __NR_waitpid 7 -#define __NR_creat 8 -#define __NR_link9 -#define __NR_unlink 10 -#define __NR_execve 11 -#define __NR_chdir 12 -#define __NR_time 13 -#define __NR_mknod 14 -#define __NR_chmod 15 -#define __NR_lchown 16 -#define __NR_break 17 -#define __NR_oldstat18 -#define __NR_lseek 19 -#define __NR_getpid 20 -#define __NR_mount 21 -#define __NR_umount 22 -#define __NR_setuid 23 -#define __NR_getuid 24 -#define __NR_stime 25 -#define __NR_ptrace 26 -#define __NR_alarm 27 -#define __NR_oldfstat 28 -#define __NR_pause 29 -#define __NR_utime 30 -#define __NR_stty 31 -#define __NR_gtty 32 -#define __NR_access 33 -#define __NR_nice 34 -#define __NR_ftime 35 -#define __NR_sync 36 -#define __NR_kill 37 -#define __NR_rename 38 -#define __NR_mkdir 39 -#define __NR_rmdir 40 -#define __NR_dup41 -#define __NR_pipe 42 -#define __NR_times 43 -#define __NR_prof 44 -#define __NR_brk45 -#define __NR_setgid 46 -#define __NR_getgid 47 -#define __NR_signal 48 -#define __NR_geteuid49 -#define __NR_getegid50 -#define __NR_acct 51 -#define __NR_umount252 -#define __NR_lock 53 -#define __NR_ioctl 54 -#define __NR_fcntl 55 -#define __NR_mpx56 -#define __NR_setpgid57 -#define __NR_ulimit 58 -#define __NR_oldolduname59 -#define
[PATCH 2/3] powerpc: Add system call table generation support
The system call tables are in different format in all architecture and it will be difficult to manually add or modify the system calls in the respective files. To make it easy by keeping a script and which'll generate the header file and syscall table file so this change will unify them across all architectures. The system call table generation script is added in syscalls directory which contain the script to generate both uapi header file system call table generation file and syscall_32/64.tbl file which'll be the input for the scripts. syscall_32/64.tbl contains the list of available system calls along with system call number and corresponding entry point. Add a new system call in this architecture will be possible by adding new entry in the syscall_32/64.tbl file. Adding a new table entry consisting of: - System call number. - ABI. - System call name. - Entry point name. - Compat entry name, if required. syscallhdr.sh and syscalltbl.sh will generate uapi header- unistd_32/64.h and syscall_table_32/64/c32.h files respectively. File syscall_table_32/64/c32.h is included by syscall.S - the real system call table. Both .sh files will parse the content syscall.tbl to generate the header and table files. ARM, s390 and x86 architecuture does have the similar support. I leverage their implementation to come up with a generic solution. Signed-off-by: Firoz Khan --- arch/powerpc/kernel/syscalls/Makefile | 51 arch/powerpc/kernel/syscalls/syscall_32.tbl | 378 arch/powerpc/kernel/syscalls/syscall_64.tbl | 372 +++ arch/powerpc/kernel/syscalls/syscallhdr.sh | 37 +++ arch/powerpc/kernel/syscalls/syscalltbl.sh | 38 +++ 5 files changed, 876 insertions(+) create mode 100644 arch/powerpc/kernel/syscalls/Makefile create mode 100644 arch/powerpc/kernel/syscalls/syscall_32.tbl create mode 100644 arch/powerpc/kernel/syscalls/syscall_64.tbl create mode 100644 arch/powerpc/kernel/syscalls/syscallhdr.sh create mode 100644 arch/powerpc/kernel/syscalls/syscalltbl.sh diff --git a/arch/powerpc/kernel/syscalls/Makefile b/arch/powerpc/kernel/syscalls/Makefile new file mode 100644 index 000..0c87acb --- /dev/null +++ b/arch/powerpc/kernel/syscalls/Makefile @@ -0,0 +1,51 @@ +# SPDX-License-Identifier: GPL-2.0 +out := arch/$(SRCARCH)/include/generated/asm +uapi := arch/$(SRCARCH)/include/generated/uapi/asm + +_dummy := $(shell [ -d '$(uapi)' ] || mkdir -p '$(uapi)') \ + $(shell [ -d '$(out)' ] || mkdir -p '$(out)') + +syscall32 := $(srctree)/$(src)/syscall_32.tbl +syscall64 := $(srctree)/$(src)/syscall_64.tbl + +syshdr := $(srctree)/$(src)/syscallhdr.sh +systbl := $(srctree)/$(src)/syscalltbl.sh + +quiet_cmd_syshdr = SYSHDR $@ + cmd_syshdr = $(CONFIG_SHELL) '$(syshdr)' '$<' '$@' \ + '$(syshdr_abi_$(basetarget))' \ + '$(syshdr_pfx_$(basetarget))' \ + '$(syshdr_offset_$(basetarget))' + +quiet_cmd_systbl = SYSTBL $@ + cmd_systbl = $(CONFIG_SHELL) '$(systbl)' '$<' '$@' \ + '$(systbl_abi_$(basetarget))' + +$(uapi)/unistd_32.h: $(syscall32) $(syshdr) + $(call if_changed,syshdr) + +$(uapi)/unistd_64.h: $(syscall64) $(syshdr) + $(call if_changed,syshdr) + +systbl_abi_syscall_table_32 := 32 +$(out)/syscall_table_32.h: $(syscall32) $(systbl) + $(call if_changed,systbl) + +systbl_abi_syscall_table_64 := 64 +$(out)/syscall_table_64.h: $(syscall64) $(systbl) + $(call if_changed,systbl) + +systbl_abi_syscall_table_c32 := c32 +$(out)/syscall_table_c32.h: $(syscall32) $(systbl) + $(call if_changed,systbl) + +uapisyshdr-y += unistd_32.h unistd_64.h +syshdr-y += syscall_table_32.h syscall_table_64.h \ + syscall_table_c32.h + +targets+= $(uapisyshdr-y) $(syshdr-y) + +PHONY += all +all: $(addprefix $(uapi)/,$(uapisyshdr-y)) +all: $(addprefix $(out)/,$(syshdr-y)) + @: diff --git a/arch/powerpc/kernel/syscalls/syscall_32.tbl b/arch/powerpc/kernel/syscalls/syscall_32.tbl new file mode 100644 index 000..50c419c --- /dev/null +++ b/arch/powerpc/kernel/syscalls/syscall_32.tbl @@ -0,0 +1,378 @@ +# +# 32-bit system call numbers and entry vectors +# +# The format is: +# +# +# The abi is always "common" for this file. +# +0 common restart_syscall sys_restart_syscall +1 common exitsys_exit +2 common forkppc_fork +3 common readsys_read +4 common write sys_write +5 common opensys_open compat_sys_open +6 common close sys_close
[PATCH 1/3] powerpc: Replace NR_syscalls macro from asm/unistd.h
__NR_syscalls macro holds the number of system call exist in POWERPC architecture. This macro is currently the part of asm/unistd.h file. We have to change the value of __NR_syscalls, if we add or delete a system call. One of the patch in this patch series has a script which will generate a uapi header based on syscall.tbl file. The syscall.tbl file contains the number of system call information. So we have two option to update __NR_syscalls value. 1. Update __NR_syscalls in asm/unistd.h manually by counting the no.of system calls. No need to update __NR_syscalls untill we either add a new system call or delete an existing system call. 2. We can keep this feature it above mentioned script, that'll count the number of syscalls and keep it in a generated file. In this case we don't need to explicitly update __NR_syscalls in asm/unistd.h file. The 2nd option will be the recommended one. For that, I moved the NR_syscalls macro from asm/unistd.h to uapi/asm/unistd.h. The macro name also changed form NR_syscalls to __NR_syscalls for making the name convention same across all architecture. While __NR_syscalls isn't strictly part of the uapi, having it as part of the generated header to simplifies the implementation. Signed-off-by: Firoz Khan --- arch/powerpc/include/asm/unistd.h | 3 +-- arch/powerpc/include/uapi/asm/unistd.h | 2 ++ 2 files changed, 3 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/include/asm/unistd.h b/arch/powerpc/include/asm/unistd.h index c19379f..54732f9 100644 --- a/arch/powerpc/include/asm/unistd.h +++ b/arch/powerpc/include/asm/unistd.h @@ -11,8 +11,7 @@ #include - -#define NR_syscalls389 +#define NR_syscalls __NR_syscalls #define __NR__exit __NR_exit diff --git a/arch/powerpc/include/uapi/asm/unistd.h b/arch/powerpc/include/uapi/asm/unistd.h index 985534d..f999df2 100644 --- a/arch/powerpc/include/uapi/asm/unistd.h +++ b/arch/powerpc/include/uapi/asm/unistd.h @@ -401,4 +401,6 @@ #define __NR_rseq 387 #define __NR_io_pgetevents 388 +#define __NR_syscalls 389 + #endif /* _UAPI_ASM_POWERPC_UNISTD_H_ */ -- 1.9.1
[PATCH 0/3] System call table generation support
The purpose of this patch series is: 1. We can easily add/modify/delete system call by changing entry in syscall.tbl file. No need to manually edit many files. 2. It is easy to unify the system call implementation across all the architectures. The system call tables are in different format in all architecture and it will be difficult to manually add or modify the system calls in the respective files manually. To make it easy by keeping a script and which'll generate the header file and syscall table file so this change will unify them across all architectures. syscall.tbl contains the list of available system calls along with system call number and corresponding entry point. Add a new system call in this architecture will be possible by adding new entry in the syscall.tbl file. Adding a new table entry consisting of: - System call number. - ABI. - System call name. - Entry point name. - Compat entry name, if required. ARM, s390 and x86 architecuture does exist the similar support. I leverage their implementation to come up with a generic solution. I have done the same support for work for alpha, m68k, microblaze, ia64, mips, parisc, sh, sparc, and xtensa. But I started sending the patch for one architecuture for review. Below mentioned git repository contains more details. Git repo:- https://github.com/frzkhn/system_call_table_generator/ Finally, this is the ground work for solving the Y2038 issue. We need to add/change two dozen of system calls to solve Y2038 issue. So this patch series will help to easily modify from existing system call to Y2038 compatible system calls. I started working system call table generation on 4.17-rc1. I used marcin's script - https://github.com/hrw/syscalls-table to generate the syscall.tbl file. And this will be the input to the system call table generation script. But there are couple system call got add in the latest rc release. If run Marcin's script on latest release, It will generate a new syscall.tbl. But I still use the old file - syscall.tbl and once all review got over I'll update syscall.tbl alone w.r.to the tip of the kernel. The impact of this thing, few of the system call won't work. Firoz Khan (3): powerpc: Replace NR_syscalls macro from asm/unistd.h powerpc: Add system call table generation support powerpc: uapi header and system call table file generation arch/powerpc/Makefile | 3 + arch/powerpc/include/asm/Kbuild | 3 + arch/powerpc/include/asm/unistd.h | 3 +- arch/powerpc/include/uapi/asm/Kbuild| 2 + arch/powerpc/include/uapi/asm/unistd.h | 391 +--- arch/powerpc/kernel/Makefile| 3 +- arch/powerpc/kernel/syscall_table_32.S | 9 + arch/powerpc/kernel/syscall_table_64.S | 17 ++ arch/powerpc/kernel/syscalls/Makefile | 51 arch/powerpc/kernel/syscalls/syscall_32.tbl | 378 +++ arch/powerpc/kernel/syscalls/syscall_64.tbl | 372 ++ arch/powerpc/kernel/syscalls/syscallhdr.sh | 37 +++ arch/powerpc/kernel/syscalls/syscalltbl.sh | 38 +++ arch/powerpc/kernel/systbl.S| 50 14 files changed, 916 insertions(+), 441 deletions(-) create mode 100644 arch/powerpc/kernel/syscall_table_32.S create mode 100644 arch/powerpc/kernel/syscall_table_64.S create mode 100644 arch/powerpc/kernel/syscalls/Makefile create mode 100644 arch/powerpc/kernel/syscalls/syscall_32.tbl create mode 100644 arch/powerpc/kernel/syscalls/syscall_64.tbl create mode 100644 arch/powerpc/kernel/syscalls/syscallhdr.sh create mode 100644 arch/powerpc/kernel/syscalls/syscalltbl.sh delete mode 100644 arch/powerpc/kernel/systbl.S -- 1.9.1
Re: [PATCH v2 5/5] powerpc: Remove -mno-sched-epilog
On Sat, Sep 15, 2018 at 06:43:05AM +1000, Nicholas Piggin wrote: > On Fri, 14 Sep 2018 11:03:38 -0700 > Nick Desaulniers wrote: > > > On Thu, Sep 13, 2018 at 9:07 PM Joel Stanley wrote: > > > Last time this was proposed there was an issue reported: > > > > > > > > > https://lists.ozlabs.org/pipermail/linuxppc-dev/2014-September/121214.html > > > > > > > Heh, did PASemi sell boxes? Interesting, I'll have to read up on my history. > > > > On Thu, Sep 13, 2018 at 10:06 PM Nicholas Piggin wrote: > > > I don't think we can remove it completely because up to at least 4.6 > > > maybe 4.8 has problems. > > > > > > I have a few patches lying around I started looking at this... I'll > > > send them. > > > > Yeah, it's too bad the link above doesn't mention gcc version. > > > > The gcc bugreport mentions fixing the bug in > > 7563dc64585324f443f5ac107eb6d89ee813a2d2, not sure how to check what > > release version of gcc that is? (Do they tag releases?) > > I'm not sure, that's not in my gcc tree AFAIKS. This is a git hash in the kernel tree. > > Nick, do you have a test case or more context about this still being > > an issue in gcc 4.8? (maybe I should wait for your patch series?) > > Sorry I forgot to cc you. This has links to some of the sched > epilog bugs. > > https://marc.info/?l=linuxppc-embedded=153690223909654=2 PR44199 was backported to 4.4 and PR52828 is fixed in 4.8. Segher
Re: [PATCH 3/3] scripts/dtc: Update to upstream version v1.4.7-14-gc86da84d30e4
On 09/13/18 13:28, Rob Herring wrote: > Major changes are I2C and SPI bus checks, YAML output format (for > future validation), some new libfdt functions, and more libfdt > validation of dtbs. > > The YAML addition adds an optional dependency on libyaml. pkg-config is > used to test for it and pkg-config became a kconfig dependency in 4.18. For Ubuntu, the libyaml dependency is provided by the packages: libyaml-0-2 libyaml-dev -Frank > > This adds the following commits from upstream: > > c86da84d30e4 Add support for YAML encoded output > 361b5e7d8067 Make type_marker_length helper public < snip >
Re: [PATCH v2 05/17] compat_ioctl: move more drivers to generic_compat_ioctl_ptrarg
On Fri, Sep 14, 2018 at 01:35:06PM -0700, Darren Hart wrote: > Acked-by: Darren Hart (VMware) > > As for a longer term solution, would it be possible to init fops in such > a way that the compat_ioctl call defaults to generic_compat_ioctl_ptrarg > so we don't have to duplicate this boilerplate for every ioctl fops > structure? Bad idea, that... Because several years down the road somebody will add an ioctl that takes an unsigned int for argument. Without so much as looking at your magical mystery macro being used to initialize file_operations. FWIW, I would name that helper in more blunt way - something like compat_ioctl_only_compat_pointer_ioctls_here()...
Re: [PATCH v2 5/5] PCI/powerpc/eeh: Add pcibios hooks for preparing to rescan
On 9/12/18 1:39 PM, wrote: > On Mon, 2018-09-10 at 19:00 +0300, Sergey Miroshnichenko wrote: >> >> Yes, missing a real EEH event is possible, unfortunately, and it is >> indeed worth mentioning. >> >> To reduce this probability the next patchset I'll post in a few days >> among other things puts all the affected device drivers to pause during >> rescan, mainly because of moving BARs and bridge windows, but it will >> also help here a bit. > > How do you deal with moving BARs etc... within the segmenting > restrictions of EEH ? > Actually, [un]fortunately, we haven't encountered any segmenting issues yet, but to move BARs we are using the same existing mechanism in Linux kernel that re-enumerated the PCIe topology during startup with "pci=realloc" and PCI_REASSIGN_ALL_BUS. What restrictions must be broken to provoke a segmenting event? Are there any other limitations on segmenting besides keeping all the BARs of the PHB within its huge M32+M64 segments which are 2GiB+4GiB on our setup? > It's a horrible mess right now and I don't know if the current code can > even work properly to be honest. > > Cheers, > Ben. > > Best regards, Serge
Re: [PATCH v2 2/5] powerpc/boot: Fix crt0.S syntax for clang
On Fri, Sep 14, 2018 at 10:47:08AM -0700, Nick Desaulniers wrote: > On Thu, Sep 13, 2018 at 9:07 PM Joel Stanley wrote: > > 10:addis r12,r12,(-RELACOUNT)@ha > > - cmpdi r12,RELACOUNT@l > > + cmpdi r12,(RELACOUNT)@l > > Yep, as we can see above, when RELACOUNT is negated, it's wrapped in > parens. The only thing that does is make it easier for humans to read; it means exactly the same thing. Segher
Re: [PATCH 4/5] powerpc/powernv/pci: Enable reassigning the bus numbers
Hello Ben, On 9/12/18 1:35 PM, Benjamin Herrenschmidt wrote: > On Wed, 2018-09-05 at 18:40 +0300, Sergey Miroshnichenko wrote: >> PowerNV doesn't depend on PCIe topology info from DT anymore, and now >> it is able to enumerate the fabric and assign the bus numbers. > > No it's not, at least unless we drop P7 support. > > P7 has constraints on the bus ranges being aligned power-of-two for the > PE assignment to work, which is why we have to honor the firmware > provided numbers. > > Additionally, this breaks the mapping between the firmware idea of the > bus numbers and Linux idea. This will probably break all of the SR-IOV > stuff. > Oh, I see. To make this more controllable and less intrusive I've bound the PCI_REASSIGN_ALL_BUS flag to the "pci=realloc" command line argument (in version 3 of this patchset) instead of the unconditional setting. > Now we should probably fix it all by removing the FW bits completely > and doing it all from Linux, though we really need to better handle how > we deal with the segmented MMIO space. > > I would also be weary of what other parts of the code depends on that > matching between the FW bdfn and the Linux bdfn. > This approach allows us to use the same in-kernel hotplug mechanisms for PowerNV+OPAL and other platforms, so we are highly interested. Would you kindly advice what are the essential parts to start with, maybe point out some documentation on EEH segmentation and FW/OS sync? > Cheers, > Ben. > Best regards, Serge >> Signed-off-by: Sergey Miroshnichenko >> --- >> arch/powerpc/platforms/powernv/pci.c | 1 + >> 1 file changed, 1 insertion(+) >> >> diff --git a/arch/powerpc/platforms/powernv/pci.c >> b/arch/powerpc/platforms/powernv/pci.c >> index 6d4280086a08..f6eaca3123cd 100644 >> --- a/arch/powerpc/platforms/powernv/pci.c >> +++ b/arch/powerpc/platforms/powernv/pci.c >> @@ -1104,6 +1104,7 @@ void __init pnv_pci_init(void) >> struct device_node *np; >> >> pci_add_flags(PCI_CAN_SKIP_ISA_ALIGN); >> +pci_add_flags(PCI_REASSIGN_ALL_BUS); >> >> /* If we don't have OPAL, eg. in sim, just skip PCI probe */ >> if (!firmware_has_feature(FW_FEATURE_OPAL)) >
Re: [PATCH v2 05/17] compat_ioctl: move more drivers to generic_compat_ioctl_ptrarg
On Wed, Sep 12, 2018 at 05:08:52PM +0200, Arnd Bergmann wrote: > The .ioctl and .compat_ioctl file operations have the same prototype so > they can both point to the same function, which works great almost all > the time when all the commands are compatible. > > One exception is the s390 architecture, where a compat pointer is only > 31 bit wide, and converting it into a 64-bit pointer requires calling > compat_ptr(). Most drivers here will ever run in s390, but since we now > have a generic helper for it, it's easy enough to use it consistently. > > I double-checked all these drivers to ensure that all ioctl arguments > are used as pointers or are ignored, but are not interpreted as integer > values. > > Signed-off-by: Arnd Bergmann > --- ... > drivers/platform/x86/wmi.c | 2 +- ... > static void link_event_work(struct work_struct *work) > diff --git a/drivers/platform/x86/wmi.c b/drivers/platform/x86/wmi.c > index 04791ea5d97b..e4d0697e07d6 100644 > --- a/drivers/platform/x86/wmi.c > +++ b/drivers/platform/x86/wmi.c > @@ -886,7 +886,7 @@ static const struct file_operations wmi_fops = { > .read = wmi_char_read, > .open = wmi_char_open, > .unlocked_ioctl = wmi_ioctl, > - .compat_ioctl = wmi_ioctl, > + .compat_ioctl = generic_compat_ioctl_ptrarg, > }; For platform/drivers/x86: Acked-by: Darren Hart (VMware) As for a longer term solution, would it be possible to init fops in such a way that the compat_ioctl call defaults to generic_compat_ioctl_ptrarg so we don't have to duplicate this boilerplate for every ioctl fops structure? -- Darren Hart VMware Open Source Technology Center
Re: [PATCH v2 5/5] powerpc: Remove -mno-sched-epilog
On Fri, 14 Sep 2018 11:03:38 -0700 Nick Desaulniers wrote: > On Thu, Sep 13, 2018 at 9:07 PM Joel Stanley wrote: > > Last time this was proposed there was an issue reported: > > > > https://lists.ozlabs.org/pipermail/linuxppc-dev/2014-September/121214.html > > > > Heh, did PASemi sell boxes? Interesting, I'll have to read up on my history. > > On Thu, Sep 13, 2018 at 10:06 PM Nicholas Piggin wrote: > > I don't think we can remove it completely because up to at least 4.6 > > maybe 4.8 has problems. > > > > I have a few patches lying around I started looking at this... I'll > > send them. > > Yeah, it's too bad the link above doesn't mention gcc version. > > The gcc bugreport mentions fixing the bug in > 7563dc64585324f443f5ac107eb6d89ee813a2d2, not sure how to check what > release version of gcc that is? (Do they tag releases?) I'm not sure, that's not in my gcc tree AFAIKS. > > Nick, do you have a test case or more context about this still being > an issue in gcc 4.8? (maybe I should wait for your patch series?) Sorry I forgot to cc you. This has links to some of the sched epilog bugs. https://marc.info/?l=linuxppc-embedded=153690223909654=2 Thanks, Nick
Re: [PATCH v3 0/6] powerpc/powernv/pci: Discover surprise-hotplugged PCIe devices during rescan
Hello Oliver, On 9/12/18 12:49 PM, Oliver wrote: > On Tue, Sep 11, 2018 at 9:56 PM, Sergey Miroshnichenko > wrote: >> This patchset allows hotplugged PCIe devices to be enumerated during a bus >> rescan being issued via sysfs on PowerNV platforms, when the "Presence >> Detect Changed" interrupt is not available. > > Seems to be on par with the sysfs slot power hack that pnv_php uses. > Yes, ours is just for manual initiation of rescan, which helps us with reliable detection of a bridge hotplug in our particular config. >> As a first part of our work on adding support for hotplugging PCIe bridges >> full of devices (without special requirement such as Hot-Plug Controller, >> reservation of bus numbers and memory regions by firmware, etc.), this >> serie is intended to solve the first two problems of the listed below: >> >> I PowerNV doesn't discover new hotplugged PCIe devices >> II EEH is falsely triggered when poking empty slots during the PCIe rescan > > We avoid this problem in pnv_php by having OPAL to do the rescan and > Linux requests > a FDT fragment of everything under the slot. I'm don't think it's a > great system, but > it keeps firmware and the OS on the same page. > So if we re-enumerate the PCIe topology from the Linux, we must then synchronize with the firmware? How would you recommend to approach that for PowerNV and OPAL? Can we can find somewhere a list of criteria to ensure that they are properly synced? >> III The PCI subsystem is not prepared to runtime changes of BAR addresses >> IV Device drivers don't track changes of their BAR addresses >> V BARs of working devices don't move to make space for new ones > > I'm having a really hard to figuring out what would make this > necessary. Keep in mind > that each PHB has it's own set of bus numbers and it's own MMIO space, > so it's not > like you're short on either. > > How are you planning on making this sort of live-device-migration work? And > what > are you trying to do that makes the added complexity worth it? > With the "pci=realloc" command line argument and with the PCI_REASSIGN_ALL_BUS flag the kernel doesn't rely on values of bus numbers and BAR addresses provided by a firmware (OPAL via FDT in our case, BIOS/UEFI/Coreboot for x86_64), but re-enumerates the PCIe topology by its own means, and it arranges BARs quite compactly. Let's say we have two bridges plugged into neighboring ports of the root/PHB, each of them have a few NVME drives inserted and several empty slots, when the system boots. Linux makes their bridge windows adjacent, so if we plug in a new NVME into the first of them, there will be just no free space to put its BARs. Without considering memory pre-allocation, the only way we see to free some space for new BARs is to move existing BARs of the second bridge (in this example). We've implemented a "firmware-independent" proof-of-concept (not flawless, though, as you and Ben pointed out) and verified on PowerNV+OPAL and x86_64 that a running NVME with an ongoing "fio" benchmark always survives BAR movement during hotplug - of course after applying a patch that pauses the NVME Linux driver during rescan. The only visible effect is a bandwidth temporary drops to 0 for a second or two, until NVME restarts. The same for a network adapter - an SSH connection just freezes for a while. This patchset is a first part of our work, and we've just published [1] a second part (on BAR movement and pausing the drivers) for the community to review, discuss and validate. [1] https://www.spinics.net/lists/linux-pci/msg76211.html Best regards, Serge >> Tested on: >> - POWER8 PowerNV+OPAL ppc64le (our Vesnin server) w/ and w/o pci=realloc; >> - POWER8 IBM 8247-42L (pSeries); >> - POWER8 IBM 8247-42L (PowerNV+OPAL) w/ and w/o pci=realloc. >> >> Changes since v2: >> - Don't reassign bus numbers on PowerNV by default (to retain the default >>behavior), but only when pci=realloc is passed; >> - Less code affected; >> - pci_add_device_node_info is refactored with add_one_dev_pci_data; >> - Minor code cleanup. >> >> Changes since v1: >> - Fixed build for ppc64le and ppc64be when CONFIG_PCI_IOV is disabled; >> - Fixed build for ppc64e when CONFIG_EEH is disabled; >> - Fixed code style warnings. >> >> Sergey Miroshnichenko (6): >> powerpc/pci: Access PCI config space directly w/o pci_dn >> powerpc/pci: Create pci_dn on demand >> powerpc/pci: Use DT to create pci_dn for root bridges only >> powerpc/powernv/pci: Enable reassigning the bus numbers >> PCI/powerpc/eeh: Add pcibios hooks for preparing to rescan >> powerpc/pci: Reduce code duplication in pci_add_device_node_info >> >> arch/powerpc/include/asm/eeh.h | 2 + >> arch/powerpc/kernel/eeh.c| 12 ++ >> arch/powerpc/kernel/pci_dn.c | 119 ++- >> arch/powerpc/kernel/rtas_pci.c | 97 ++- >> arch/powerpc/platforms/powernv/eeh-powernv.c | 22
[GIT PULL] Devicetree fix for 4.19-rc
Linus, Please pull. One regression for a 20 year old PowerMac. Rob The following changes since commit 0413bedabc886c3a56804d1c80a58e99077b1d91: of: Add device_type access helper functions (2018-08-31 08:30:42 -0400) are available in the Git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux.git tags/devicetree-fixes-for-4.19-2 for you to fetch changes up to e54192b48da75f025ae4b277925eaf6aca1d13bd: of: fix phandle cache creation for DTs with no phandles (2018-09-11 11:28:40 -0500) Devicetree fixes for 4.19, part 2: - Fix a regression on systems having a DT without any phandles which happens on a PowerMac G3. Rob Herring (1): of: fix phandle cache creation for DTs with no phandles drivers/of/base.c | 3 +++ 1 file changed, 3 insertions(+)
Re: [PATCH v2 1/5] powerpc/Makefiles: Fix clang/llvm build
On Thu, Sep 13, 2018 at 9:07 PM Joel Stanley wrote: > > From: Anton Blanchard > > Commit 15a3204d24a3 ("powerpc/64s: Set assembler machine type to POWER4") > passes -mpower4 to the assembler. We have more recent instructions in our > assembly files, but gas permits them. The clang/llvm integrated assembler > is more strict, and we get a build failure. Note that we disable clang's integrated assembler in the top level Makefile for now, but it will still validate constraints for inline assembly. Do you know which case is meant by "build failure?" Is there a link to the Clang bug? It would be good to have that context in the commit message. > > Fix this by calling the assembler with -mcpu=power8 if as supports it, > else fall back to power4. > > Suggested-by: Nicholas Piggin > Signed-off-by: Anton Blanchard > Signed-off-by: Joel Stanley > --- > arch/powerpc/Makefile | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile > index 11a1acba164a..a70639482053 100644 > --- a/arch/powerpc/Makefile > +++ b/arch/powerpc/Makefile > @@ -238,7 +238,7 @@ cpu-as-$(CONFIG_4xx)+= -Wa,-m405 > cpu-as-$(CONFIG_ALTIVEC) += $(call as-option,-Wa$(comma)-maltivec) > cpu-as-$(CONFIG_E200) += -Wa,-me200 > cpu-as-$(CONFIG_E500) += -Wa,-me500 > -cpu-as-$(CONFIG_PPC_BOOK3S_64) += -Wa,-mpower4 > +cpu-as-$(CONFIG_PPC_BOOK3S_64) += $(call > as-option,-Wa$(comma)-mpower8,-Wa$(comma)-mpower4) > cpu-as-$(CONFIG_PPC_E500MC)+= $(call as-option,-Wa$(comma)-me500mc) > > KBUILD_AFLAGS += $(cpu-as-y) > -- > 2.17.1 > -- Thanks, ~Nick Desaulniers
Re: [PATCH v2 2/3] watchdog: mpc8xxx: provide boot status
On Fri, Sep 14, 2018 at 01:32:01PM +, Christophe Leroy wrote: > mpc8xxx watchdog driver supports the following platforms: > - mpc8xx > - mpc83xx > - mpc86xx > > Those three platforms have a 32 bits register which provides the > reason of the last boot, including whether it was caused by the > watchdog. > > mpc8xx: Register RSR, bit SWRS (bit 3) > mpc83xx: Register RSR, bit SWRS (bit 28) > mpc86xx: Register RSTRSCR, bit WDT_RR (bit 11) > > This patch maps the register as defined in the device tree and updates > wdt.bootstatus based on the value of the watchdog related bit. Then > the information can be retrieved via the WDIOC_GETBOOTSTATUS ioctl. > > Hereunder is an example of devicetree for mpc8xx, > the Reset Status Register being at offset 0x288: > > WDT: watchdog@0 { > compatible = "fsl,mpc823-wdt"; > reg = <0x0 0x10 0x288 0x4>; > }; > > On the mpc83xx, RSR is at offset 0x910 > On the mpc86xx, RSTRSCR is at offset 0xe0094 > > Suggested-by: Radu Rendec > Tested-by: Christophe Leroy # On mpc885 > Signed-off-by: Christophe Leroy > --- > drivers/watchdog/mpc8xxx_wdt.c | 20 > 1 file changed, 20 insertions(+) > > diff --git a/drivers/watchdog/mpc8xxx_wdt.c b/drivers/watchdog/mpc8xxx_wdt.c > index 1dcf5f10cdd9..4a4700458b17 100644 > --- a/drivers/watchdog/mpc8xxx_wdt.c > +++ b/drivers/watchdog/mpc8xxx_wdt.c > @@ -47,6 +47,7 @@ struct mpc8xxx_wdt { > struct mpc8xxx_wdt_type { > int prescaler; > bool hw_enabled; > + u32 rsr_mask; > }; > > struct mpc8xxx_wdt_ddata { > @@ -136,6 +137,7 @@ static int mpc8xxx_wdt_probe(struct platform_device > *ofdev) > u32 freq = fsl_get_sys_freq(); > bool enabled; > struct device *dev = >dev; > + u32 __iomem *rsr = NULL; > > wdt_type = of_device_get_match_data(dev); > if (!wdt_type) > @@ -159,6 +161,21 @@ static int mpc8xxx_wdt_probe(struct platform_device > *ofdev) > return -ENODEV; > } > > + res = platform_get_resource(ofdev, IORESOURCE_MEM, 1); > + if (res) > + rsr = ioremap(res->start, resource_size(res)); > + if (rsr) { This if() can be inside the first if(), and it should be something like if (res) { rsr = ioremap(res->start, resource_size(res)); if (!rsr) { dev_err(...); return -ENOMEM; } ... } ... because _if_ the resource is provided in dt it should be valid. Thanks, Guenter > + bool status = in_be32(rsr) & wdt_type->rsr_mask; > + > + ddata->wdd.bootstatus = status ? WDIOF_CARDRESET : 0; > + /* clear reset status bits related to watchdog timer */ > + out_be32(rsr, wdt_type->rsr_mask); > + iounmap(rsr); > + > + dev_info(dev, "Last boot was %scaused by watchdog\n", > + status ? "" : "not "); > + } > + > spin_lock_init(>lock); > > ddata->wdd.info = _wdt_info, > @@ -216,6 +233,7 @@ static const struct of_device_id mpc8xxx_wdt_match[] = { > .compatible = "mpc83xx_wdt", > .data = &(struct mpc8xxx_wdt_type) { > .prescaler = 0x1, > + .rsr_mask = BIT(3), /* RSR Bit SWRS */ > }, > }, > { > @@ -223,6 +241,7 @@ static const struct of_device_id mpc8xxx_wdt_match[] = { > .data = &(struct mpc8xxx_wdt_type) { > .prescaler = 0x1, > .hw_enabled = true, > + .rsr_mask = BIT(20), /* RSTRSCR Bit WDT_RR */ > }, > }, > { > @@ -230,6 +249,7 @@ static const struct of_device_id mpc8xxx_wdt_match[] = { > .data = &(struct mpc8xxx_wdt_type) { > .prescaler = 0x800, > .hw_enabled = true, > + .rsr_mask = BIT(28), /* RSR Bit SWRS */ > }, > }, > {}, > -- > 2.13.3 >
Re: [PATCH v2 3/3] dt-bindings: watchdog: add mpc8xxx-wdt support
On Fri, Sep 14, 2018 at 01:32:03PM +, Christophe Leroy wrote: > Add description of DT bindings for mpc8xxx-wdt driver which > handles the CPU watchdog timer on the mpc83xx, mpc86xx and mpc8xx. > > Signed-off-by: Christophe Leroy > --- > .../devicetree/bindings/watchdog/mpc8xxx-wdt.txt | 25 > ++ > 1 file changed, 25 insertions(+) > create mode 100644 Documentation/devicetree/bindings/watchdog/mpc8xxx-wdt.txt > > diff --git a/Documentation/devicetree/bindings/watchdog/mpc8xxx-wdt.txt > b/Documentation/devicetree/bindings/watchdog/mpc8xxx-wdt.txt > new file mode 100644 > index ..1d99e1e4d306 > --- /dev/null > +++ b/Documentation/devicetree/bindings/watchdog/mpc8xxx-wdt.txt > @@ -0,0 +1,25 @@ > +* Freescale mpc8xxx watchdog driver (For 83xx, 86xx and 8xx) > + > +Required properties: > +- compatible: Shall contain one of the following: > + "mpc83xx_wdt" for an mpc83xx > + "fsl,mpc8610-wdt" for an mpc86xx > + "fsl,mpc823-wdt" for an mpc8xx > +- reg: base physical address and length of the area hosting the > + watchdog registers. > + On the 83xx, "Watchdog Timer Registers" area: <0x200 0x100> > + On the 86xx, "Watchdog Timer Registers" area: <0xe4000 0x100> > + On the 8xx, "General System Interface Unit" area: <0x0 0x10> > + Note for Rob: The above has been implemented for several years. This is merely to document the current implementation. Maybe "mpc83xx_wdt" should be deprecated and replaced, but I think that should be a separate patch. > +Optional properties: > +- reg: additionnal physical address and length (4) of location of the s/additionnal/additional/ > + Reset Status Register (called RSTRSCR on the mpc86xx) > + On the 83xx, it is located at offset 0x910 > + On the 86xx, it is located at offset 0xe0094 > + On the 8xx, it is located at offset 0x288 > + > +Example: > + WDT: watchdog@0 { > + compatible = "fsl,mpc823-wdt"; > + reg = <0x0 0x10 0x288 0x4>; > + }; > -- > 2.13.3 >
Re: [PATCH v2 1/3] watchdog: mpc8xxx: use dev_xxxx() instead of pr_xxxx()
On Fri, Sep 14, 2018 at 01:31:59PM +, Christophe Leroy wrote: > mpc8xxx watchdog driver is a platform device drivers, it is > therefore possible to use dev_xxx() messaging rather than pr_xxx() > > Signed-off-by: Christophe Leroy Reviewed-by: Guenter Roeck > --- > drivers/watchdog/mpc8xxx_wdt.c | 24 > 1 file changed, 12 insertions(+), 12 deletions(-) > > diff --git a/drivers/watchdog/mpc8xxx_wdt.c b/drivers/watchdog/mpc8xxx_wdt.c > index aca2d6323f8a..1dcf5f10cdd9 100644 > --- a/drivers/watchdog/mpc8xxx_wdt.c > +++ b/drivers/watchdog/mpc8xxx_wdt.c > @@ -17,8 +17,6 @@ > * option) any later version. > */ > > -#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt > - > #include > #include > #include > @@ -137,26 +135,27 @@ static int mpc8xxx_wdt_probe(struct platform_device > *ofdev) > struct mpc8xxx_wdt_ddata *ddata; > u32 freq = fsl_get_sys_freq(); > bool enabled; > + struct device *dev = >dev; > > - wdt_type = of_device_get_match_data(>dev); > + wdt_type = of_device_get_match_data(dev); > if (!wdt_type) > return -EINVAL; > > if (!freq || freq == -1) > return -EINVAL; > > - ddata = devm_kzalloc(>dev, sizeof(*ddata), GFP_KERNEL); > + ddata = devm_kzalloc(dev, sizeof(*ddata), GFP_KERNEL); > if (!ddata) > return -ENOMEM; > > res = platform_get_resource(ofdev, IORESOURCE_MEM, 0); > - ddata->base = devm_ioremap_resource(>dev, res); > + ddata->base = devm_ioremap_resource(dev, res); > if (IS_ERR(ddata->base)) > return PTR_ERR(ddata->base); > > enabled = in_be32(>base->swcrr) & SWCRR_SWEN; > if (!enabled && wdt_type->hw_enabled) { > - pr_info("could not be enabled in software\n"); > + dev_info(dev, "could not be enabled in software\n"); > return -ENODEV; > } > > @@ -166,7 +165,7 @@ static int mpc8xxx_wdt_probe(struct platform_device > *ofdev) > ddata->wdd.ops = _wdt_ops, > > ddata->wdd.timeout = WATCHDOG_TIMEOUT; > - watchdog_init_timeout(>wdd, timeout, >dev); > + watchdog_init_timeout(>wdd, timeout, dev); > > watchdog_set_nowayout(>wdd, nowayout); > > @@ -189,12 +188,13 @@ static int mpc8xxx_wdt_probe(struct platform_device > *ofdev) > > ret = watchdog_register_device(>wdd); > if (ret) { > - pr_err("cannot register watchdog device (err=%d)\n", ret); > + dev_err(dev, "cannot register watchdog device (err=%d)\n", ret); > return ret; > } > > - pr_info("WDT driver for MPC8xxx initialized. mode:%s timeout=%d sec\n", > - reset ? "reset" : "interrupt", ddata->wdd.timeout); > + dev_info(dev, > + "WDT driver for MPC8xxx initialized. mode:%s timeout=%d sec\n", > + reset ? "reset" : "interrupt", ddata->wdd.timeout); > > platform_set_drvdata(ofdev, ddata); > return 0; > @@ -204,8 +204,8 @@ static int mpc8xxx_wdt_remove(struct platform_device > *ofdev) > { > struct mpc8xxx_wdt_ddata *ddata = platform_get_drvdata(ofdev); > > - pr_crit("Watchdog removed, expect the %s soon!\n", > - reset ? "reset" : "machine check exception"); > + dev_crit(>dev, "Watchdog removed, expect the %s soon!\n", > + reset ? "reset" : "machine check exception"); > watchdog_unregister_device(>wdd); > > return 0; > -- > 2.13.3 >
Re: KVM: PPC: Book3S HV: Fix guest r11 corruption with POWER9 TM workarounds
On 2018-09-14 22:26, sathn...@linux.vnet.ibm.com wrote: Date: Thu, 13 Sep 2018 15:33:47 +1000 From: Michael Neuling To: m...@ellerman.id.au Cc: linuxppc-dev@lists.ozlabs.org, kvm-...@vger.kernel.org, pau...@ozlabs.org, sjitindarsi...@gmail.com, mi...@neuling.org Subject: KVM: PPC: Book3S HV: Fix guest r11 corruption with POWER9 TM workarounds When we come into the softpatch handler (0x1500), we use r11 to store the HSRR0 for later use by the denorm handler. We also use the softpatch handler for the TM workarounds for POWER9. Unfortunately, in kvmppc_interrupt_hv we later store r11 out to the vcpu assuming it's still what we got from userspace. This causes r11 to be corrupted in the VCPU and hence when we restore the guest, we get a corrupted r11. We've seen this when running TM tests inside guests on P9. This fixes the problem by only touching r11 in the denorm case. Fixes: 4bb3c7a020 ("KVM: PPC: Book3S HV: Work around transactional memory bugs in POWER9") Cc: # 4.17+ Test-by: Suraj Jitindar Singh Reviewed-by: Paul Mackerras Signed-off-by: Michael Neuling --- arch/powerpc/kernel/exceptions-64s.S | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) Tested-by: Satheesh Rajendran Test details: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1792501 Regards, -Satheesh. diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index ea04dfb8c0..2d8fc8c9da 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -1314,9 +1314,7 @@ EXC_REAL_BEGIN(denorm_exception_hv, 0x1500, 0x100) #ifdef CONFIG_PPC_DENORMALISATION mfspr r10,SPRN_HSRR1 - mfspr r11,SPRN_HSRR0 /* save HSRR0 */ andis. r10,r10,(HSRR1_DENORM)@h /* denorm? */ - addir11,r11,-4 /* HSRR0 is next instruction */ bne+denorm_assist #endif @@ -1382,6 +1380,8 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_207S) */ XVCPSGNDP32(32) denorm_done: + mfspr r11,SPRN_HSRR0 + subir11,r11,4 mtspr SPRN_HSRR0,r11 mtcrf 0x80,r9 ld r9,PACA_EXGEN+EX_R9(r13)
Re: [PATCH] powerpc/fadump: re-register firmware-assisted dump if already registered
On Friday 14 September 2018 07:58 PM, Petr Tesarik wrote: On Fri, 14 Sep 2018 19:36:02 +0530 Hari Bathini wrote: Firmware-Assisted Dump (FADump) needs to be registered again after any memory hot add/remove operation to update the crash memory ranges. But currently, the kernel returns '-EEXIST' if we try to register without uregistering it first. This could expose the system to racing issues while unregistering and registering FADump from userspace during udev events. Spare the userspace of this and let it be taken care of in the kernel space for a simpler interface. Since this change, running 'echo 1 > /sys/kernel/fadump_registered' would result in re-regisering (unregistering and registering) FADump, if it was already registered. Great improvement to the API! Any suggestions what should be done in a client which tries to be compatible with kernels before this change and after this change? If `echo 1 > /sys/kernel/fadump_registered` fails, check for the output of `cat /sys/kernel/fadump_registered` and if it is still `1`, that indicates old kernel and we are already registered. Treat it as success if being registered is what we care about or unregister/register (if re-register is the intention).. Hope that helps.. Thanks Hari
[PATCH 12/12] powerpc/64s/hash: Add a SLB preload cache
When switching processes, currently all user SLBEs are cleared, and a few (exec_base, pc, and stack) are preloaded. In trivial testing with small apps, this tends to miss the heap and low 256MB segments, and it will also miss commonly accessed segments on large memory workloads. Add a simple round-robin preload cache that just inserts the last SLB miss into the head of the cache and preloads those at context switch time. Every 256 context switches, the oldest entry is removed from the cache to shrink the cache and require fewer slbmte if they are unused. Much more could go into this, including into the SLB entry reclaim side to track some LRU information etc, which would require a study of large memory workloads. But this is a simple thing we can do now that is an obvious win for common workloads. With the full series, process switching speed on the context_switch benchmark on POWER9/hash (with kernel speculation security masures disabled) increases from 140K/s to 178K/s (27%). POWER8 does not change much (within 1%), it's unclear why it does not see a big gain like POWER9. Booting to busybox init with 256MB segments has SLB misses go down from 945 to 69, and with 1T segments 900 to 21. These could almost all be eliminated by preloading a bit more carefully with ELF binary loading. Signed-off-by: Nicholas Piggin --- arch/powerpc/include/asm/processor.h | 1 + arch/powerpc/include/asm/thread_info.h | 5 + arch/powerpc/kernel/process.c | 7 ++ arch/powerpc/mm/mmu_context_book3s64.c | 4 + arch/powerpc/mm/slb.c | 166 +++-- 5 files changed, 143 insertions(+), 40 deletions(-) diff --git a/arch/powerpc/include/asm/processor.h b/arch/powerpc/include/asm/processor.h index 12b76ecdc57d..936795acba48 100644 --- a/arch/powerpc/include/asm/processor.h +++ b/arch/powerpc/include/asm/processor.h @@ -273,6 +273,7 @@ struct thread_struct { #endif /* CONFIG_HAVE_HW_BREAKPOINT */ struct arch_hw_breakpoint hw_brk; /* info on the hardware breakpoint */ unsigned long trap_nr;/* last trap # on this thread */ + u8 load_slb;/* Ages out SLB preload cache entries */ u8 load_fp; #ifdef CONFIG_ALTIVEC u8 load_vec; diff --git a/arch/powerpc/include/asm/thread_info.h b/arch/powerpc/include/asm/thread_info.h index f9a442bb5a72..9e78b7d26b64 100644 --- a/arch/powerpc/include/asm/thread_info.h +++ b/arch/powerpc/include/asm/thread_info.h @@ -29,6 +29,7 @@ #include #include +#define SLB_PRELOAD_NR 16U /* * low level task data. */ @@ -44,6 +45,10 @@ struct thread_info { #if defined(CONFIG_VIRT_CPU_ACCOUNTING_NATIVE) && defined(CONFIG_PPC32) struct cpu_accounting_data accounting; #endif + unsigned char slb_preload_nr; + unsigned char slb_preload_tail; + u32 slb_preload_esid[SLB_PRELOAD_NR]; + /* low level flags - has atomic operations done on it */ unsigned long flags cacheline_aligned_in_smp; }; diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c index e4feb45ae4c6..03c2e1f134bc 100644 --- a/arch/powerpc/kernel/process.c +++ b/arch/powerpc/kernel/process.c @@ -1719,6 +1719,8 @@ int copy_thread(unsigned long clone_flags, unsigned long usp, return 0; } +void preload_new_slb_context(unsigned long start, unsigned long sp); + /* * Set up a thread for executing a new program */ @@ -1726,6 +1728,10 @@ void start_thread(struct pt_regs *regs, unsigned long start, unsigned long sp) { #ifdef CONFIG_PPC64 unsigned long load_addr = regs->gpr[2]; /* saved by ELF_PLAT_INIT */ + +#ifdef CONFIG_PPC_BOOK3S_64 + preload_new_slb_context(start, sp); +#endif #endif /* @@ -1816,6 +1822,7 @@ void start_thread(struct pt_regs *regs, unsigned long start, unsigned long sp) #ifdef CONFIG_VSX current->thread.used_vsr = 0; #endif + current->thread.load_slb = 0; current->thread.load_fp = 0; memset(>thread.fp_state, 0, sizeof(current->thread.fp_state)); current->thread.fp_save_area = NULL; diff --git a/arch/powerpc/mm/mmu_context_book3s64.c b/arch/powerpc/mm/mmu_context_book3s64.c index f7352c66b6b8..510f103d7813 100644 --- a/arch/powerpc/mm/mmu_context_book3s64.c +++ b/arch/powerpc/mm/mmu_context_book3s64.c @@ -53,6 +53,8 @@ int hash__alloc_context_id(void) } EXPORT_SYMBOL_GPL(hash__alloc_context_id); +void slb_setup_new_exec(void); + static int hash__init_new_context(struct mm_struct *mm) { int index; @@ -87,6 +89,8 @@ static int hash__init_new_context(struct mm_struct *mm) void hash__setup_new_exec(void) { slice_setup_new_exec(); + + slb_setup_new_exec(); } static int radix__init_new_context(struct mm_struct *mm) diff --git a/arch/powerpc/mm/slb.c b/arch/powerpc/mm/slb.c index 98521fec3536..d200728fe41b 100644 --- a/arch/powerpc/mm/slb.c +++ b/arch/powerpc/mm/slb.c @@ -187,41 +187,119 @@ void slb_vmalloc_update(void)
[PATCH 11/12] powerpc/64s/hash: provide arch_setup_exec hooks for hash slice setup
This will be used by the SLB code in the next patch, but for now this sets the slb_addr_limit to the correct size for 32-bit tasks. Signed-off-by: Nicholas Piggin --- arch/powerpc/include/asm/book3s/64/mmu-hash.h | 2 ++ arch/powerpc/include/asm/slice.h | 1 + arch/powerpc/include/asm/thread_info.h| 6 ++ arch/powerpc/kernel/process.c | 9 + arch/powerpc/mm/mmu_context_book3s64.c| 5 + arch/powerpc/mm/slice.c | 14 ++ 6 files changed, 37 insertions(+) diff --git a/arch/powerpc/include/asm/book3s/64/mmu-hash.h b/arch/powerpc/include/asm/book3s/64/mmu-hash.h index 4c8d413ce99a..fc68058554fa 100644 --- a/arch/powerpc/include/asm/book3s/64/mmu-hash.h +++ b/arch/powerpc/include/asm/book3s/64/mmu-hash.h @@ -487,6 +487,8 @@ int htab_remove_mapping(unsigned long vstart, unsigned long vend, extern void pseries_add_gpage(u64 addr, u64 page_size, unsigned long number_of_pages); extern void demote_segment_4k(struct mm_struct *mm, unsigned long addr); +extern void hash__setup_new_exec(void); + #ifdef CONFIG_PPC_PSERIES void hpte_init_pseries(void); #else diff --git a/arch/powerpc/include/asm/slice.h b/arch/powerpc/include/asm/slice.h index e40406cf5628..a595461c9cb0 100644 --- a/arch/powerpc/include/asm/slice.h +++ b/arch/powerpc/include/asm/slice.h @@ -32,6 +32,7 @@ void slice_set_range_psize(struct mm_struct *mm, unsigned long start, unsigned long len, unsigned int psize); void slice_init_new_context_exec(struct mm_struct *mm); +void slice_setup_new_exec(void); #endif /* __ASSEMBLY__ */ diff --git a/arch/powerpc/include/asm/thread_info.h b/arch/powerpc/include/asm/thread_info.h index 3c0002044bc9..f9a442bb5a72 100644 --- a/arch/powerpc/include/asm/thread_info.h +++ b/arch/powerpc/include/asm/thread_info.h @@ -72,6 +72,12 @@ static inline struct thread_info *current_thread_info(void) } extern int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src); + +#ifdef CONFIG_PPC_BOOK3S_64 +void arch_setup_new_exec(void); +#define arch_setup_new_exec arch_setup_new_exec +#endif + #endif /* __ASSEMBLY__ */ /* diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c index 913c5725cdb2..e4feb45ae4c6 100644 --- a/arch/powerpc/kernel/process.c +++ b/arch/powerpc/kernel/process.c @@ -1482,6 +1482,15 @@ void flush_thread(void) #endif /* CONFIG_HAVE_HW_BREAKPOINT */ } +#ifdef CONFIG_PPC_BOOK3S_64 +void arch_setup_new_exec(void) +{ + if (radix_enabled()) + return; + hash__setup_new_exec(); +} +#endif + int set_thread_uses_vas(void) { #ifdef CONFIG_PPC_BOOK3S_64 diff --git a/arch/powerpc/mm/mmu_context_book3s64.c b/arch/powerpc/mm/mmu_context_book3s64.c index dbd8f762140b..f7352c66b6b8 100644 --- a/arch/powerpc/mm/mmu_context_book3s64.c +++ b/arch/powerpc/mm/mmu_context_book3s64.c @@ -84,6 +84,11 @@ static int hash__init_new_context(struct mm_struct *mm) return index; } +void hash__setup_new_exec(void) +{ + slice_setup_new_exec(); +} + static int radix__init_new_context(struct mm_struct *mm) { unsigned long rts_field; diff --git a/arch/powerpc/mm/slice.c b/arch/powerpc/mm/slice.c index 606f424aac47..fc5b3a1ec666 100644 --- a/arch/powerpc/mm/slice.c +++ b/arch/powerpc/mm/slice.c @@ -746,6 +746,20 @@ void slice_init_new_context_exec(struct mm_struct *mm) bitmap_fill(mask->high_slices, SLICE_NUM_HIGH); } +#ifdef CONFIG_PPC_BOOK3S_64 +void slice_setup_new_exec(void) +{ + struct mm_struct *mm = current->mm; + + slice_dbg("slice_setup_new_exec(mm=%p)\n", mm); + + if (!is_32bit_task()) + return; + + mm->context.slb_addr_limit = DEFAULT_MAP_WINDOW; +} +#endif + void slice_set_range_psize(struct mm_struct *mm, unsigned long start, unsigned long len, unsigned int psize) { -- 2.18.0
[PATCH 10/12] powerpc/64s: xmon do not dump hash fields when using radix mode
Signed-off-by: Nicholas Piggin --- arch/powerpc/xmon/xmon.c | 40 +--- 1 file changed, 21 insertions(+), 19 deletions(-) diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c index 323aac8321fa..5dec84aba59e 100644 --- a/arch/powerpc/xmon/xmon.c +++ b/arch/powerpc/xmon/xmon.c @@ -2378,30 +2378,32 @@ static void dump_one_paca(int cpu) DUMP(p, cpu_start, "%#-*x"); DUMP(p, kexec_state, "%#-*x"); #ifdef CONFIG_PPC_BOOK3S_64 - for (i = 0; i < SLB_NUM_BOLTED; i++) { - u64 esid, vsid; + if (!early_radix_enabled()) { + for (i = 0; i < SLB_NUM_BOLTED; i++) { + u64 esid, vsid; - if (!p->slb_shadow_ptr) - continue; + if (!p->slb_shadow_ptr) + continue; - esid = be64_to_cpu(p->slb_shadow_ptr->save_area[i].esid); - vsid = be64_to_cpu(p->slb_shadow_ptr->save_area[i].vsid); + esid = be64_to_cpu(p->slb_shadow_ptr->save_area[i].esid); + vsid = be64_to_cpu(p->slb_shadow_ptr->save_area[i].vsid); - if (esid || vsid) { - printf(" %-*s[%d] = 0x%016llx 0x%016llx\n", - 22, "slb_shadow", i, esid, vsid); + if (esid || vsid) { + printf(" %-*s[%d] = 0x%016llx 0x%016llx\n", + 22, "slb_shadow", i, esid, vsid); + } } - } - DUMP(p, vmalloc_sllp, "%#-*x"); - DUMP(p, stab_rr, "%#-*x"); - DUMP(p, slb_used_bitmap, "%#-*x"); - DUMP(p, slb_kern_bitmap, "%#-*x"); + DUMP(p, vmalloc_sllp, "%#-*x"); + DUMP(p, stab_rr, "%#-*x"); + DUMP(p, slb_used_bitmap, "%#-*x"); + DUMP(p, slb_kern_bitmap, "%#-*x"); - if (!early_cpu_has_feature(CPU_FTR_ARCH_300)) { - DUMP(p, slb_cache_ptr, "%#-*x"); - for (i = 0; i < SLB_CACHE_ENTRIES; i++) - printf(" %-*s[%d] = 0x%016x\n", - 22, "slb_cache", i, p->slb_cache[i]); + if (!early_cpu_has_feature(CPU_FTR_ARCH_300)) { + DUMP(p, slb_cache_ptr, "%#-*x"); + for (i = 0; i < SLB_CACHE_ENTRIES; i++) + printf(" %-*s[%d] = 0x%016x\n", + 22, "slb_cache", i, p->slb_cache[i]); + } } DUMP(p, rfi_flush_fallback_area, "%-*px"); -- 2.18.0
[PATCH 09/12] powerpc/64s/hash: SLB allocation status bitmaps
Add 32-entry bitmaps to track the allocation status of the first 32 SLB entries, and whether they are user or kernel entries. These are used to allocate free SLB entries first, before resorting to the round robin allocator. Signed-off-by: Nicholas Piggin --- arch/powerpc/include/asm/paca.h | 6 ++- arch/powerpc/kernel/asm-offsets.c | 2 +- arch/powerpc/mm/slb.c | 62 +-- arch/powerpc/xmon/xmon.c | 4 +- 4 files changed, 58 insertions(+), 16 deletions(-) diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h index 8c258a057207..bf7ab59be3b8 100644 --- a/arch/powerpc/include/asm/paca.h +++ b/arch/powerpc/include/asm/paca.h @@ -113,7 +113,10 @@ struct paca_struct { * on the linear mapping */ /* SLB related definitions */ u16 vmalloc_sllp; - u16 slb_cache_ptr; + u8 slb_cache_ptr; + u8 stab_rr; /* stab/slb round-robin counter */ + u32 slb_used_bitmap;/* Bitmaps for first 32 SLB entries. */ + u32 slb_kern_bitmap; u32 slb_cache[SLB_CACHE_ENTRIES]; #endif /* CONFIG_PPC_BOOK3S_64 */ @@ -148,7 +151,6 @@ struct paca_struct { */ struct task_struct *__current; /* Pointer to current */ u64 kstack; /* Saved Kernel stack addr */ - u64 stab_rr;/* stab/slb round-robin counter */ u64 saved_r1; /* r1 save for RTAS calls or PM or EE=0 */ u64 saved_msr; /* MSR saved here by enter_rtas */ u16 trap_save; /* Used when bad stack is encountered */ diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c index 43b67ead5b97..1f79cbf3da62 100644 --- a/arch/powerpc/kernel/asm-offsets.c +++ b/arch/powerpc/kernel/asm-offsets.c @@ -173,7 +173,6 @@ int main(void) OFFSET(PACAKSAVE, paca_struct, kstack); OFFSET(PACACURRENT, paca_struct, __current); OFFSET(PACASAVEDMSR, paca_struct, saved_msr); - OFFSET(PACASTABRR, paca_struct, stab_rr); OFFSET(PACAR1, paca_struct, saved_r1); OFFSET(PACATOC, paca_struct, kernel_toc); OFFSET(PACAKBASE, paca_struct, kernelbase); @@ -203,6 +202,7 @@ int main(void) #ifdef CONFIG_PPC_BOOK3S_64 OFFSET(PACASLBCACHE, paca_struct, slb_cache); OFFSET(PACASLBCACHEPTR, paca_struct, slb_cache_ptr); + OFFSET(PACASTABRR, paca_struct, stab_rr); OFFSET(PACAVMALLOCSLLP, paca_struct, vmalloc_sllp); #ifdef CONFIG_PPC_MM_SLICES OFFSET(MMUPSIZESLLP, mmu_psize_def, sllp); diff --git a/arch/powerpc/mm/slb.c b/arch/powerpc/mm/slb.c index d782a70d4a5d..98521fec3536 100644 --- a/arch/powerpc/mm/slb.c +++ b/arch/powerpc/mm/slb.c @@ -122,6 +122,9 @@ void slb_restore_bolted_realmode(void) { __slb_restore_bolted_realmode(); get_paca()->slb_cache_ptr = 0; + + get_paca()->slb_kern_bitmap = (1U << SLB_NUM_BOLTED) - 1; + get_paca()->slb_used_bitmap = get_paca()->slb_kern_bitmap; } /* @@ -129,9 +132,6 @@ void slb_restore_bolted_realmode(void) */ void slb_flush_all_realmode(void) { - /* -* This flushes all SLB entries including 0, so it must be realmode. -*/ asm volatile("slbmte %0,%0; slbia" : : "r" (0)); } @@ -177,6 +177,9 @@ void slb_flush_and_rebolt(void) : "memory"); get_paca()->slb_cache_ptr = 0; + + get_paca()->slb_kern_bitmap = (1U << SLB_NUM_BOLTED) - 1; + get_paca()->slb_used_bitmap = get_paca()->slb_kern_bitmap; } void slb_vmalloc_update(void) @@ -273,10 +276,13 @@ void switch_slb(struct task_struct *tsk, struct mm_struct *mm) "isync" :: "r"(ksp_vsid_data), "r"(ksp_esid_data)); + + get_paca()->slb_kern_bitmap = (1U << SLB_NUM_BOLTED) - 1; } get_paca()->slb_cache_ptr = 0; } + get_paca()->slb_used_bitmap = get_paca()->slb_kern_bitmap; /* * preload some userspace segments into the SLB. @@ -349,6 +355,8 @@ void slb_initialize(void) } get_paca()->stab_rr = SLB_NUM_BOLTED - 1; + get_paca()->slb_kern_bitmap = (1U << SLB_NUM_BOLTED) - 1; + get_paca()->slb_used_bitmap = get_paca()->slb_kern_bitmap; lflags = SLB_VSID_KERNEL | linear_llp; @@ -400,17 +408,47 @@ static void slb_cache_update(unsigned long esid_data) } } -static enum slb_index alloc_slb_index(void) +static enum slb_index alloc_slb_index(bool kernel) { enum slb_index index; - /* round-robin replacement of slb starting at SLB_NUM_BOLTED. */ - index = get_paca()->stab_rr; - if (index < (mmu_slb_size - 1)) - index++; - else - index = SLB_NUM_BOLTED; - get_paca()->stab_rr = index; +
[PATCH 08/12] powerpc/64s/hash: remove user SLB data from the paca
User SLB mappig data is copied into the PACA from the mm->context so it can be accessed by the SLB miss handlers. After the C conversion, SLB miss handlers now run with relocation on, and user SLB misses are able to take recursive kernel SLB misses, so the user SLB mapping data can be removed from the paca and accessed directly. Signed-off-by: Nicholas Piggin --- arch/powerpc/include/asm/book3s/64/mmu-hash.h | 1 + arch/powerpc/include/asm/paca.h | 13 -- arch/powerpc/kernel/asm-offsets.c | 9 arch/powerpc/kernel/paca.c| 21 - arch/powerpc/mm/hash_utils_64.c | 46 +-- arch/powerpc/mm/mmu_context.c | 3 +- arch/powerpc/mm/slb.c | 20 +++- arch/powerpc/mm/slice.c | 29 8 files changed, 40 insertions(+), 102 deletions(-) diff --git a/arch/powerpc/include/asm/book3s/64/mmu-hash.h b/arch/powerpc/include/asm/book3s/64/mmu-hash.h index 20d9ca736bbd..4c8d413ce99a 100644 --- a/arch/powerpc/include/asm/book3s/64/mmu-hash.h +++ b/arch/powerpc/include/asm/book3s/64/mmu-hash.h @@ -496,6 +496,7 @@ static inline void hpte_init_pseries(void) { } extern void hpte_init_native(void); extern void slb_initialize(void); +extern void core_flush_all_slbs(struct mm_struct *mm); extern void slb_flush_and_rebolt(void); void slb_flush_all_realmode(void); void __slb_restore_bolted_realmode(void); diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h index 4331295db0f7..8c258a057207 100644 --- a/arch/powerpc/include/asm/paca.h +++ b/arch/powerpc/include/asm/paca.h @@ -143,18 +143,6 @@ struct paca_struct { struct tlb_core_data tcd; #endif /* CONFIG_PPC_BOOK3E */ -#ifdef CONFIG_PPC_BOOK3S - mm_context_id_t mm_ctx_id; -#ifdef CONFIG_PPC_MM_SLICES - unsigned char mm_ctx_low_slices_psize[BITS_PER_LONG / BITS_PER_BYTE]; - unsigned char mm_ctx_high_slices_psize[SLICE_ARRAY_SIZE]; - unsigned long mm_ctx_slb_addr_limit; -#else - u16 mm_ctx_user_psize; - u16 mm_ctx_sllp; -#endif -#endif - /* * then miscellaneous read-write fields */ @@ -256,7 +244,6 @@ struct paca_struct { #endif /* CONFIG_PPC_PSERIES */ } cacheline_aligned; -extern void copy_mm_to_paca(struct mm_struct *mm); extern struct paca_struct **paca_ptrs; extern void initialise_paca(struct paca_struct *new_paca, int cpu); extern void setup_paca(struct paca_struct *new_paca); diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c index 7834256585f1..43b67ead5b97 100644 --- a/arch/powerpc/kernel/asm-offsets.c +++ b/arch/powerpc/kernel/asm-offsets.c @@ -181,15 +181,6 @@ int main(void) OFFSET(PACAIRQSOFTMASK, paca_struct, irq_soft_mask); OFFSET(PACAIRQHAPPENED, paca_struct, irq_happened); OFFSET(PACA_FTRACE_ENABLED, paca_struct, ftrace_enabled); -#ifdef CONFIG_PPC_BOOK3S - OFFSET(PACACONTEXTID, paca_struct, mm_ctx_id); -#ifdef CONFIG_PPC_MM_SLICES - OFFSET(PACALOWSLICESPSIZE, paca_struct, mm_ctx_low_slices_psize); - OFFSET(PACAHIGHSLICEPSIZE, paca_struct, mm_ctx_high_slices_psize); - OFFSET(PACA_SLB_ADDR_LIMIT, paca_struct, mm_ctx_slb_addr_limit); - DEFINE(MMUPSIZEDEFSIZE, sizeof(struct mmu_psize_def)); -#endif /* CONFIG_PPC_MM_SLICES */ -#endif #ifdef CONFIG_PPC_BOOK3E OFFSET(PACAPGD, paca_struct, pgd); diff --git a/arch/powerpc/kernel/paca.c b/arch/powerpc/kernel/paca.c index 0ee3e6d50f28..6752e17f0281 100644 --- a/arch/powerpc/kernel/paca.c +++ b/arch/powerpc/kernel/paca.c @@ -259,24 +259,3 @@ void __init free_unused_pacas(void) paca_ptrs_size + paca_struct_size, nr_cpu_ids); } -void copy_mm_to_paca(struct mm_struct *mm) -{ -#ifdef CONFIG_PPC_BOOK3S - mm_context_t *context = >context; - - get_paca()->mm_ctx_id = context->id; -#ifdef CONFIG_PPC_MM_SLICES - VM_BUG_ON(!mm->context.slb_addr_limit); - get_paca()->mm_ctx_slb_addr_limit = mm->context.slb_addr_limit; - memcpy(_paca()->mm_ctx_low_slices_psize, - >low_slices_psize, sizeof(context->low_slices_psize)); - memcpy(_paca()->mm_ctx_high_slices_psize, - >high_slices_psize, TASK_SLICE_ARRAY_SZ(mm)); -#else /* CONFIG_PPC_MM_SLICES */ - get_paca()->mm_ctx_user_psize = context->user_psize; - get_paca()->mm_ctx_sllp = context->sllp; -#endif -#else /* !CONFIG_PPC_BOOK3S */ - return; -#endif -} diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c index f23a89d8e4ce..88c95dc8b141 100644 --- a/arch/powerpc/mm/hash_utils_64.c +++ b/arch/powerpc/mm/hash_utils_64.c @@ -1088,16 +1088,16 @@ unsigned int hash_page_do_lazy_icache(unsigned int pp, pte_t pte, int trap) } #ifdef CONFIG_PPC_MM_SLICES -static unsigned int get_paca_psize(unsigned long addr) +static unsigned int get_psize(struct mm_struct *mm, unsigned long addr) {
[PATCH 07/12] powerpc/64s/hash: convert SLB miss handlers to C
This patch moves SLB miss handlers completely to C, using the standard exception handler macros to set up the stack and branch to C. This can be done because the segment containing the kernel stack is always bolted, so accessing it with relocation on will not cause an SLB exception. Arbitrary kernel memory may not be accessed when handling kernel space SLB misses, so care should be taken there. However user SLB misses can access any kernel memory, which can be used to move some fields out of the paca (in later patches). User SLB misses could quite easily reconcile IRQs and set up a first class kernel environment and exit via ret_from_except, however that doesn't seem to be necessary at the moment, so we only do that if a bad fault is encountered. [ Credit to Aneesh for bug fixes, error checks, and improvements to bad address handling, etc ] Signed-off-by: Nicholas Piggin Since RFC: - Added MSR[RI] handling - Fixed up a register loss bug exposed by irq tracing (Aneesh) - Reject misses outside the defined kernel regions (Aneesh) - Added several more sanity checks and error handling (Aneesh), we may look at consolidating these tests and tightenig up the code but for a first pass we decided it's better to check carefully. Since v1: - Fixed SLB cache corruption (Aneesh) - Fixed untidy SLBE allocation "leak" in get_vsid error case - Now survives some stress testing on real hardware --- arch/powerpc/include/asm/asm-prototypes.h | 2 + arch/powerpc/include/asm/exception-64s.h | 8 - arch/powerpc/kernel/exceptions-64s.S | 202 +++-- arch/powerpc/mm/Makefile | 2 +- arch/powerpc/mm/slb.c | 271 + arch/powerpc/mm/slb_low.S | 335 -- 6 files changed, 196 insertions(+), 624 deletions(-) delete mode 100644 arch/powerpc/mm/slb_low.S diff --git a/arch/powerpc/include/asm/asm-prototypes.h b/arch/powerpc/include/asm/asm-prototypes.h index 1f4691ce4126..78ed3c3f879a 100644 --- a/arch/powerpc/include/asm/asm-prototypes.h +++ b/arch/powerpc/include/asm/asm-prototypes.h @@ -78,6 +78,8 @@ void kernel_bad_stack(struct pt_regs *regs); void system_reset_exception(struct pt_regs *regs); void machine_check_exception(struct pt_regs *regs); void emulation_assist_interrupt(struct pt_regs *regs); +long do_slb_fault(struct pt_regs *regs, unsigned long ea); +void do_bad_slb_fault(struct pt_regs *regs, unsigned long ea, long err); /* signals, syscalls and interrupts */ long sys_swapcontext(struct ucontext __user *old_ctx, diff --git a/arch/powerpc/include/asm/exception-64s.h b/arch/powerpc/include/asm/exception-64s.h index a86fead0..47578b79f0fb 100644 --- a/arch/powerpc/include/asm/exception-64s.h +++ b/arch/powerpc/include/asm/exception-64s.h @@ -60,14 +60,6 @@ */ #define MAX_MCE_DEPTH 4 -/* - * EX_LR is only used in EXSLB and where it does not overlap with EX_DAR - * EX_CCR similarly with DSISR, but being 4 byte registers there is a hole - * in the save area so it's not necessary to overlap them. Could be used - * for future savings though if another 4 byte register was to be saved. - */ -#define EX_LR EX_DAR - /* * EX_R3 is only used by the bad_stack handler. bad_stack reloads and * saves DAR from SPRN_DAR, and EX_DAR is not used. So EX_R3 can overlap diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index 9dad73722d1a..c4f372ef4842 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -567,28 +567,36 @@ ALT_MMU_FTR_SECTION_END_IFCLR(MMU_FTR_TYPE_RADIX) EXC_REAL_BEGIN(data_access_slb, 0x380, 0x80) - SET_SCRATCH0(r13) - EXCEPTION_PROLOG_0(PACA_EXSLB) - EXCEPTION_PROLOG_1(PACA_EXSLB, KVMTEST_PR, 0x380) - mr r12,r3 /* save r3 */ - mfspr r3,SPRN_DAR - mfspr r11,SPRN_SRR1 - crset 4*cr6+eq - BRANCH_TO_COMMON(r10, slb_miss_common) +EXCEPTION_PROLOG(PACA_EXSLB, data_access_slb_common, EXC_STD, KVMTEST_PR, 0x380); EXC_REAL_END(data_access_slb, 0x380, 0x80) EXC_VIRT_BEGIN(data_access_slb, 0x4380, 0x80) - SET_SCRATCH0(r13) - EXCEPTION_PROLOG_0(PACA_EXSLB) - EXCEPTION_PROLOG_1(PACA_EXSLB, NOTEST, 0x380) - mr r12,r3 /* save r3 */ - mfspr r3,SPRN_DAR - mfspr r11,SPRN_SRR1 - crset 4*cr6+eq - BRANCH_TO_COMMON(r10, slb_miss_common) +EXCEPTION_RELON_PROLOG(PACA_EXSLB, data_access_slb_common, EXC_STD, NOTEST, 0x380); EXC_VIRT_END(data_access_slb, 0x4380, 0x80) + TRAMP_KVM_SKIP(PACA_EXSLB, 0x380) +EXC_COMMON_BEGIN(data_access_slb_common) + mfspr r10,SPRN_DAR + std r10,PACA_EXSLB+EX_DAR(r13) + EXCEPTION_PROLOG_COMMON(0x380, PACA_EXSLB) + ld r4,PACA_EXSLB+EX_DAR(r13) + std r4,_DAR(r1) + addir3,r1,STACK_FRAME_OVERHEAD + bl do_slb_fault + cmpdi r3,0 + bne-1f + b fast_exception_return +1: /*
[PATCH 06/12] powerpc/64s/hash: Use POWER9 SLBIA IH=3 variant in switch_slb
POWER9 introduces SLBIA IH=3, which invalidates all SLB entries and associated lookaside information that have a class value of 1, which Linux assigns to user addresses. This matches what switch_slb wants, and allows a simple fast implementation that avoids the slb_cache complexity. As a side-effect, the POWER5 < DD2.1 SLB invalidation workaround is also avoided on POWER9. Process context switching rate is improved about 2.2% for a small process that hits the slb cache which is the best case for the current code. Signed-of-by: Nicholas Piggin --- arch/powerpc/mm/slb.c| 86 +++- arch/powerpc/xmon/xmon.c | 11 +++-- 2 files changed, 57 insertions(+), 40 deletions(-) diff --git a/arch/powerpc/mm/slb.c b/arch/powerpc/mm/slb.c index 03fa1c663ccf..319c772f7cbd 100644 --- a/arch/powerpc/mm/slb.c +++ b/arch/powerpc/mm/slb.c @@ -209,7 +209,6 @@ static inline int esids_match(unsigned long addr1, unsigned long addr2) /* Flush all user entries from the segment table of the current processor. */ void switch_slb(struct task_struct *tsk, struct mm_struct *mm) { - unsigned long offset; unsigned long pc = KSTK_EIP(tsk); unsigned long stack = KSTK_ESP(tsk); unsigned long exec_base; @@ -221,45 +220,57 @@ void switch_slb(struct task_struct *tsk, struct mm_struct *mm) * which would update the slb_cache/slb_cache_ptr fields in the PACA. */ hard_irq_disable(); - offset = get_paca()->slb_cache_ptr; - if (!mmu_has_feature(MMU_FTR_NO_SLBIE_B) && - offset <= SLB_CACHE_ENTRIES) { - unsigned long slbie_data; - int i; - - asm volatile("isync" : : : "memory"); - for (i = 0; i < offset; i++) { - slbie_data = (unsigned long)get_paca()->slb_cache[i] - << SID_SHIFT; /* EA */ - slbie_data |= user_segment_size(slbie_data) - << SLBIE_SSIZE_SHIFT; - slbie_data |= SLBIE_C; /* C set for user addresses */ - asm volatile("slbie %0" : : "r" (slbie_data)); - } - - /* Workaround POWER5 < DD2.1 issue */ - if (!cpu_has_feature(CPU_FTR_ARCH_207S) && offset == 1) - asm volatile("slbie %0" : : "r" (slbie_data)); + if (cpu_has_feature(CPU_FTR_ARCH_300)) { + /* +* SLBIA IH=3 invalidates all Class=1 SLBEs and their +* associated lookaside structures, which matches what +* switch_slb wants. So ARCH_300 does not use the slb +* cache. +*/ + asm volatile("isync ; " PPC_SLBIA(3)" ; isync"); - asm volatile("isync" : : : "memory"); } else { - struct slb_shadow *p = get_slb_shadow(); - unsigned long ksp_esid_data = - be64_to_cpu(p->save_area[KSTACK_INDEX].esid); - unsigned long ksp_vsid_data = - be64_to_cpu(p->save_area[KSTACK_INDEX].vsid); - - asm volatile("isync\n" -PPC_SLBIA(1) "\n" -"slbmte%0,%1\n" -"isync" -:: "r"(ksp_vsid_data), - "r"(ksp_esid_data)); - - asm volatile("isync" : : : "memory"); + unsigned long offset = get_paca()->slb_cache_ptr; + + if (!mmu_has_feature(MMU_FTR_NO_SLBIE_B) && + offset <= SLB_CACHE_ENTRIES) { + unsigned long slbie_data; + int i; + + asm volatile("isync" : : : "memory"); + for (i = 0; i < offset; i++) { + /* EA */ + slbie_data = (unsigned long) + get_paca()->slb_cache[i] << SID_SHIFT; + slbie_data |= user_segment_size(slbie_data) + << SLBIE_SSIZE_SHIFT; + slbie_data |= SLBIE_C; /* user slbs have C=1 */ + asm volatile("slbie %0" : : "r" (slbie_data)); + } + + /* Workaround POWER5 < DD2.1 issue */ + if (!cpu_has_feature(CPU_FTR_ARCH_207S) && offset == 1) + asm volatile("slbie %0" : : "r" (slbie_data)); + + asm volatile("isync" : : : "memory"); + } else { + struct slb_shadow *p = get_slb_shadow(); + unsigned long ksp_esid_data = + be64_to_cpu(p->save_area[KSTACK_INDEX].esid); + unsigned long ksp_vsid_data = +
[PATCH 05/12] powerpc/64s/hash: Use POWER6 SLBIA IH=1 variant in switch_slb
The SLBIA IH=1 hint will remove all non-zero SLBEs, but only invalidate ERAT entries associated with a class value of 1, for processors that support the hint (e.g., POWER6 and newer), which Linux assigns to user addresses. This prevents kernel ERAT entries from being invalidated when context switchig (if the thread faulted in more than 8 user SLBEs). Signed-off-by: Nicholas Piggin --- arch/powerpc/mm/slb.c | 38 +++--- 1 file changed, 23 insertions(+), 15 deletions(-) diff --git a/arch/powerpc/mm/slb.c b/arch/powerpc/mm/slb.c index a5e58f11d676..03fa1c663ccf 100644 --- a/arch/powerpc/mm/slb.c +++ b/arch/powerpc/mm/slb.c @@ -128,13 +128,21 @@ void slb_flush_all_realmode(void) asm volatile("slbmte %0,%0; slbia" : : "r" (0)); } -static void __slb_flush_and_rebolt(void) +void slb_flush_and_rebolt(void) { /* If you change this make sure you change SLB_NUM_BOLTED * and PR KVM appropriately too. */ unsigned long linear_llp, lflags; unsigned long ksp_esid_data, ksp_vsid_data; + WARN_ON(!irqs_disabled()); + + /* +* We can't take a PMU exception in the following code, so hard +* disable interrupts. +*/ + hard_irq_disable(); + linear_llp = mmu_psize_defs[mmu_linear_psize].sllp; lflags = SLB_VSID_KERNEL | linear_llp; @@ -160,20 +168,7 @@ static void __slb_flush_and_rebolt(void) :: "r"(ksp_vsid_data), "r"(ksp_esid_data) : "memory"); -} -void slb_flush_and_rebolt(void) -{ - - WARN_ON(!irqs_disabled()); - - /* -* We can't take a PMU exception in the following code, so hard -* disable interrupts. -*/ - hard_irq_disable(); - - __slb_flush_and_rebolt(); get_paca()->slb_cache_ptr = 0; } @@ -248,7 +243,20 @@ void switch_slb(struct task_struct *tsk, struct mm_struct *mm) asm volatile("isync" : : : "memory"); } else { - __slb_flush_and_rebolt(); + struct slb_shadow *p = get_slb_shadow(); + unsigned long ksp_esid_data = + be64_to_cpu(p->save_area[KSTACK_INDEX].esid); + unsigned long ksp_vsid_data = + be64_to_cpu(p->save_area[KSTACK_INDEX].vsid); + + asm volatile("isync\n" +PPC_SLBIA(1) "\n" +"slbmte%0,%1\n" +"isync" +:: "r"(ksp_vsid_data), + "r"(ksp_esid_data)); + + asm volatile("isync" : : : "memory"); } get_paca()->slb_cache_ptr = 0; -- 2.18.0
[PATCH 04/12] powerpc/64s/hash: remove the vmalloc segment from the bolted SLB
Remove the vmalloc segment from bolted SLBEs. This is not required to be bolted, and seems like it was added to help pre-load the SLB on context switch. However there are now other segments like the vmemmap segment and non-zero node memory that often take misses after a context switch, so it is better to solve this in a more general way. A subsequent change will track free SLB entries and uses those rather than round-robin overwrite valid entries, which makes it far less likely for kernel SLBEs to be evicted after they are installed. Signed-off-by: Nicholas Piggin --- arch/powerpc/include/asm/book3s/64/mmu-hash.h | 2 +- arch/powerpc/mm/slb.c | 23 --- 2 files changed, 6 insertions(+), 19 deletions(-) diff --git a/arch/powerpc/include/asm/book3s/64/mmu-hash.h b/arch/powerpc/include/asm/book3s/64/mmu-hash.h index b3520b549cba..20d9ca736bbd 100644 --- a/arch/powerpc/include/asm/book3s/64/mmu-hash.h +++ b/arch/powerpc/include/asm/book3s/64/mmu-hash.h @@ -30,7 +30,7 @@ * SLB */ -#define SLB_NUM_BOLTED 3 +#define SLB_NUM_BOLTED 2 #define SLB_CACHE_ENTRIES 8 #define SLB_MIN_SIZE 32 diff --git a/arch/powerpc/mm/slb.c b/arch/powerpc/mm/slb.c index d952ece3abf7..a5e58f11d676 100644 --- a/arch/powerpc/mm/slb.c +++ b/arch/powerpc/mm/slb.c @@ -30,8 +30,7 @@ enum slb_index { LINEAR_INDEX= 0, /* Kernel linear map (0xc000) */ - VMALLOC_INDEX = 1, /* Kernel virtual map (0xd000) */ - KSTACK_INDEX= 2, /* Kernel stack map */ + KSTACK_INDEX= 1, /* Kernel stack map */ }; extern void slb_allocate(unsigned long ea); @@ -133,13 +132,11 @@ static void __slb_flush_and_rebolt(void) { /* If you change this make sure you change SLB_NUM_BOLTED * and PR KVM appropriately too. */ - unsigned long linear_llp, vmalloc_llp, lflags, vflags; + unsigned long linear_llp, lflags; unsigned long ksp_esid_data, ksp_vsid_data; linear_llp = mmu_psize_defs[mmu_linear_psize].sllp; - vmalloc_llp = mmu_psize_defs[mmu_vmalloc_psize].sllp; lflags = SLB_VSID_KERNEL | linear_llp; - vflags = SLB_VSID_KERNEL | vmalloc_llp; ksp_esid_data = mk_esid_data(get_paca()->kstack, mmu_kernel_ssize, KSTACK_INDEX); if ((ksp_esid_data & ~0xfffUL) <= PAGE_OFFSET) { @@ -157,14 +154,10 @@ static void __slb_flush_and_rebolt(void) * the stack between the slbia and rebolting it. */ asm volatile("isync\n" "slbia\n" -/* Slot 1 - first VMALLOC segment */ +/* Slot 1 - kernel stack */ "slbmte%0,%1\n" -/* Slot 2 - kernel stack */ -"slbmte%2,%3\n" "isync" -:: "r"(mk_vsid_data(VMALLOC_START, mmu_kernel_ssize, vflags)), - "r"(mk_esid_data(VMALLOC_START, mmu_kernel_ssize, VMALLOC_INDEX)), - "r"(ksp_vsid_data), +:: "r"(ksp_vsid_data), "r"(ksp_esid_data) : "memory"); } @@ -186,10 +179,6 @@ void slb_flush_and_rebolt(void) void slb_vmalloc_update(void) { - unsigned long vflags; - - vflags = SLB_VSID_KERNEL | mmu_psize_defs[mmu_vmalloc_psize].sllp; - slb_shadow_update(VMALLOC_START, mmu_kernel_ssize, vflags, VMALLOC_INDEX); slb_flush_and_rebolt(); } @@ -324,7 +313,7 @@ void slb_set_size(u16 size) void slb_initialize(void) { unsigned long linear_llp, vmalloc_llp, io_llp; - unsigned long lflags, vflags; + unsigned long lflags; static int slb_encoding_inited; #ifdef CONFIG_SPARSEMEM_VMEMMAP unsigned long vmemmap_llp; @@ -360,14 +349,12 @@ void slb_initialize(void) get_paca()->stab_rr = SLB_NUM_BOLTED - 1; lflags = SLB_VSID_KERNEL | linear_llp; - vflags = SLB_VSID_KERNEL | vmalloc_llp; /* Invalidate the entire SLB (even entry 0) & all the ERATS */ asm volatile("isync":::"memory"); asm volatile("slbmte %0,%0"::"r" (0) : "memory"); asm volatile("isync; slbia; isync":::"memory"); create_shadowed_slbe(PAGE_OFFSET, mmu_kernel_ssize, lflags, LINEAR_INDEX); - create_shadowed_slbe(VMALLOC_START, mmu_kernel_ssize, vflags, VMALLOC_INDEX); /* For the boot cpu, we're running on the stack in init_thread_union, * which is in the first segment of the linear mapping, and also -- 2.18.0
[PATCH 03/12] powerpc/64s/hash: move POWER5 < DD2.1 slbie workaround where it is needed
The POWER5 < DD2.1 issue is that slbie needs to be issued more than once. It came in with this change: ChangeSet@1.1608, 2004-04-29 07:12:31-07:00, da...@gibson.dropbear.id.au [PATCH] POWER5 erratum workaround Early POWER5 revisions ( --- arch/powerpc/mm/slb.c | 14 +++--- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/arch/powerpc/mm/slb.c b/arch/powerpc/mm/slb.c index 1c7128c63a4b..d952ece3abf7 100644 --- a/arch/powerpc/mm/slb.c +++ b/arch/powerpc/mm/slb.c @@ -226,7 +226,6 @@ static inline int esids_match(unsigned long addr1, unsigned long addr2) void switch_slb(struct task_struct *tsk, struct mm_struct *mm) { unsigned long offset; - unsigned long slbie_data = 0; unsigned long pc = KSTK_EIP(tsk); unsigned long stack = KSTK_ESP(tsk); unsigned long exec_base; @@ -241,7 +240,9 @@ void switch_slb(struct task_struct *tsk, struct mm_struct *mm) offset = get_paca()->slb_cache_ptr; if (!mmu_has_feature(MMU_FTR_NO_SLBIE_B) && offset <= SLB_CACHE_ENTRIES) { + unsigned long slbie_data; int i; + asm volatile("isync" : : : "memory"); for (i = 0; i < offset; i++) { slbie_data = (unsigned long)get_paca()->slb_cache[i] @@ -251,15 +252,14 @@ void switch_slb(struct task_struct *tsk, struct mm_struct *mm) slbie_data |= SLBIE_C; /* C set for user addresses */ asm volatile("slbie %0" : : "r" (slbie_data)); } - asm volatile("isync" : : : "memory"); - } else { - __slb_flush_and_rebolt(); - } - if (!cpu_has_feature(CPU_FTR_ARCH_207S)) { /* Workaround POWER5 < DD2.1 issue */ - if (offset == 1 || offset > SLB_CACHE_ENTRIES) + if (!cpu_has_feature(CPU_FTR_ARCH_207S) && offset == 1) asm volatile("slbie %0" : : "r" (slbie_data)); + + asm volatile("isync" : : : "memory"); + } else { + __slb_flush_and_rebolt(); } get_paca()->slb_cache_ptr = 0; -- 2.18.0
[PATCH 02/12] powerpc/64s/hash: avoid the POWER5 < DD2.1 slb invalidate workaround on POWER8/9
I only have POWER8/9 to test, so just remove it for those. Signed-off-by: Nicholas Piggin --- arch/powerpc/kernel/entry_64.S | 2 ++ arch/powerpc/mm/slb.c | 8 +--- 2 files changed, 7 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S index 2206912ea4f0..77a888bfcb53 100644 --- a/arch/powerpc/kernel/entry_64.S +++ b/arch/powerpc/kernel/entry_64.S @@ -672,7 +672,9 @@ END_MMU_FTR_SECTION_IFSET(MMU_FTR_1T_SEGMENT) isync slbie r6 +BEGIN_FTR_SECTION slbie r6 /* Workaround POWER5 < DD2.1 issue */ +END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_207S) slbmte r7,r0 isync 2: diff --git a/arch/powerpc/mm/slb.c b/arch/powerpc/mm/slb.c index 2f162c6e52d4..1c7128c63a4b 100644 --- a/arch/powerpc/mm/slb.c +++ b/arch/powerpc/mm/slb.c @@ -256,9 +256,11 @@ void switch_slb(struct task_struct *tsk, struct mm_struct *mm) __slb_flush_and_rebolt(); } - /* Workaround POWER5 < DD2.1 issue */ - if (offset == 1 || offset > SLB_CACHE_ENTRIES) - asm volatile("slbie %0" : : "r" (slbie_data)); + if (!cpu_has_feature(CPU_FTR_ARCH_207S)) { + /* Workaround POWER5 < DD2.1 issue */ + if (offset == 1 || offset > SLB_CACHE_ENTRIES) + asm volatile("slbie %0" : : "r" (slbie_data)); + } get_paca()->slb_cache_ptr = 0; copy_mm_to_paca(mm); -- 2.18.0
[PATCH 01/12] powerpc/64s/hash: Fix stab_rr off by one initialization
This causes SLB alloation to start 1 beyond the start of the SLB. There is no real problem because after it wraps it stats behaving properly, it's just surprisig to see when looking at SLB traces. Signed-off-by: Nicholas Piggin --- arch/powerpc/mm/slb.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/mm/slb.c b/arch/powerpc/mm/slb.c index 9f574e59d178..2f162c6e52d4 100644 --- a/arch/powerpc/mm/slb.c +++ b/arch/powerpc/mm/slb.c @@ -355,7 +355,7 @@ void slb_initialize(void) #endif } - get_paca()->stab_rr = SLB_NUM_BOLTED; + get_paca()->stab_rr = SLB_NUM_BOLTED - 1; lflags = SLB_VSID_KERNEL | linear_llp; vflags = SLB_VSID_KERNEL | vmalloc_llp; -- 2.18.0
[PATCH 00/12] SLB miss conversion to C, and SLB optimisations
This is a repost of the SLB conversion to C, no real change since last post. But given that slows down the SLB miss handler, I promised some optimisations could be made to mitigate that. The two main optimisations after the C conversion are the SLB alloation bitmaps, and the preload cache. Thanks, Nick Nicholas Piggin (12): powerpc/64s/hash: Fix stab_rr off by one initialization powerpc/64s/hash: avoid the POWER5 < DD2.1 slb invalidate workaround on POWER8/9 powerpc/64s/hash: move POWER5 < DD2.1 slbie workaround where it is needed powerpc/64s/hash: remove the vmalloc segment from the bolted SLB powerpc/64s/hash: Use POWER6 SLBIA IH=1 variant in switch_slb powerpc/64s/hash: Use POWER9 SLBIA IH=3 variant in switch_slb powerpc/64s/hash: convert SLB miss handlers to C powerpc/64s/hash: remove user SLB data from the paca powerpc/64s/hash: SLB allocation status bitmaps powerpc/64s: xmon do not dump hash fields when using radix mode powerpc/64s/hash: provide arch_setup_exec hooks for hash slice setup powerpc/64s/hash: Add a SLB preload cache arch/powerpc/include/asm/asm-prototypes.h | 2 + arch/powerpc/include/asm/book3s/64/mmu-hash.h | 5 +- arch/powerpc/include/asm/exception-64s.h | 8 - arch/powerpc/include/asm/paca.h | 19 +- arch/powerpc/include/asm/processor.h | 1 + arch/powerpc/include/asm/slice.h | 1 + arch/powerpc/include/asm/thread_info.h| 11 + arch/powerpc/kernel/asm-offsets.c | 11 +- arch/powerpc/kernel/entry_64.S| 2 + arch/powerpc/kernel/exceptions-64s.S | 202 ++ arch/powerpc/kernel/paca.c| 21 - arch/powerpc/kernel/process.c | 16 + arch/powerpc/mm/Makefile | 2 +- arch/powerpc/mm/hash_utils_64.c | 46 +- arch/powerpc/mm/mmu_context.c | 3 +- arch/powerpc/mm/mmu_context_book3s64.c| 9 + arch/powerpc/mm/slb.c | 596 -- arch/powerpc/mm/slb_low.S | 335 -- arch/powerpc/mm/slice.c | 43 +- arch/powerpc/xmon/xmon.c | 37 +- 20 files changed, 540 insertions(+), 830 deletions(-) delete mode 100644 arch/powerpc/mm/slb_low.S -- 2.18.0
Re: [PATCH] powerpc/fadump: re-register firmware-assisted dump if already registered
On Fri, 14 Sep 2018 19:36:02 +0530 Hari Bathini wrote: > Firmware-Assisted Dump (FADump) needs to be registered again after any > memory hot add/remove operation to update the crash memory ranges. But > currently, the kernel returns '-EEXIST' if we try to register without > uregistering it first. This could expose the system to racing issues > while unregistering and registering FADump from userspace during udev > events. Spare the userspace of this and let it be taken care of in the > kernel space for a simpler interface. > > Since this change, running 'echo 1 > /sys/kernel/fadump_registered' > would result in re-regisering (unregistering and registering) FADump, > if it was already registered. Great improvement to the API! Any suggestions what should be done in a client which tries to be compatible with kernels before this change and after this change? Petr T > Signed-off-by: Hari Bathini > --- > arch/powerpc/kernel/fadump.c |4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c > index a711d22..761b28b 100644 > --- a/arch/powerpc/kernel/fadump.c > +++ b/arch/powerpc/kernel/fadump.c > @@ -1444,8 +1444,8 @@ static ssize_t fadump_register_store(struct kobject > *kobj, > break; > case 1: > if (fw_dump.dump_registered == 1) { > - ret = -EEXIST; > - goto unlock_out; > + /* Un-register Firmware-assisted dump */ > + fadump_unregister_dump(); > } > /* Register Firmware-assisted dump */ > ret = register_fadump(); >
Re: [PATCH v2 05/17] compat_ioctl: move more drivers to generic_compat_ioctl_ptrarg
On Wed, Sep 12, 2018 at 05:08:52PM +0200, Arnd Bergmann wrote: > The .ioctl and .compat_ioctl file operations have the same prototype so > they can both point to the same function, which works great almost all > the time when all the commands are compatible. > > One exception is the s390 architecture, where a compat pointer is only > 31 bit wide, and converting it into a 64-bit pointer requires calling > compat_ptr(). Most drivers here will ever run in s390, but since we now > have a generic helper for it, it's easy enough to use it consistently. > > I double-checked all these drivers to ensure that all ioctl arguments > are used as pointers or are ignored, but are not interpreted as integer > values. > > Signed-off-by: Arnd Bergmann > --- > fs/btrfs/super.c| 2 +- Acked-by: David Sterba
[PATCH] powerpc/fadump: re-register firmware-assisted dump if already registered
Firmware-Assisted Dump (FADump) needs to be registered again after any memory hot add/remove operation to update the crash memory ranges. But currently, the kernel returns '-EEXIST' if we try to register without uregistering it first. This could expose the system to racing issues while unregistering and registering FADump from userspace during udev events. Spare the userspace of this and let it be taken care of in the kernel space for a simpler interface. Since this change, running 'echo 1 > /sys/kernel/fadump_registered' would result in re-regisering (unregistering and registering) FADump, if it was already registered. Signed-off-by: Hari Bathini --- arch/powerpc/kernel/fadump.c |4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c index a711d22..761b28b 100644 --- a/arch/powerpc/kernel/fadump.c +++ b/arch/powerpc/kernel/fadump.c @@ -1444,8 +1444,8 @@ static ssize_t fadump_register_store(struct kobject *kobj, break; case 1: if (fw_dump.dump_registered == 1) { - ret = -EEXIST; - goto unlock_out; + /* Un-register Firmware-assisted dump */ + fadump_unregister_dump(); } /* Register Firmware-assisted dump */ ret = register_fadump();
Re: [PATCH] watchdog: mpc8xxx: provide boot status
Le 13/09/2018 à 22:25, Guenter Roeck a écrit : On Thu, Sep 13, 2018 at 08:07:21AM +, Christophe Leroy wrote: mpc8xxx watchdog driver supports the following platforms: - mpc8xx - mpc83xx - mpc86xx Those three platforms have a 32 bits register which provides the reason of the last boot, including whether it was caused by the watchdog. mpc8xx: Register RSR, bit SWRS (bit 3) mpc83xx: Register RSR, bit SWRS (bit 28) mpc86xx: Register RSTRSCR, bit WDT_RR (bit 11) This patch maps the register as defined in the device tree and updates wdt.bootstatus based on the value of the watchdog related bit. Then the information can be retrieved via the WDIOC_GETBOOTSTATUS ioctl. Hereunder is an exemple of devicetree for mpc8xx, example ok the Reset Status Register being at offset 0x288: WDT: watchdog@0 { compatible = "fsl,mpc823-wdt"; reg = <0x0 0x10 0x288 0x4>; This isn't documented anywhere, and no one wil know how to use it. So far that was grandfathered in, but with more complex usage it really needs to be documented. Ok, added a binding }; On the mpc83xx, RSR is at offset 0x910 On the mpc86xx, RSTRSCR is at offset 0xe0094 Suggested-by: Radu Rendec Tested-by: Christophe Leroy # On mpc885 Signed-off-by: Christophe Leroy --- drivers/watchdog/mpc8xxx_wdt.c | 22 ++ 1 file changed, 22 insertions(+) diff --git a/drivers/watchdog/mpc8xxx_wdt.c b/drivers/watchdog/mpc8xxx_wdt.c index aca2d6323f8a..2951a485a6b4 100644 --- a/drivers/watchdog/mpc8xxx_wdt.c +++ b/drivers/watchdog/mpc8xxx_wdt.c @@ -49,10 +49,12 @@ struct mpc8xxx_wdt { struct mpc8xxx_wdt_type { int prescaler; bool hw_enabled; + u32 rsr_mask; }; struct mpc8xxx_wdt_ddata { struct mpc8xxx_wdt __iomem *base; + u32 __iomem *rsr; struct watchdog_device wdd; spinlock_t lock; u16 swtc; @@ -137,6 +139,7 @@ static int mpc8xxx_wdt_probe(struct platform_device *ofdev) struct mpc8xxx_wdt_ddata *ddata; u32 freq = fsl_get_sys_freq(); bool enabled; + struct device *dev = >dev; If you introduce this variable, please use it everywhere in the function. ok, introduced it and changed every pr_xxx() to dev_xxx() in a preceeding patch. wdt_type = of_device_get_match_data(>dev); if (!wdt_type) @@ -160,6 +163,22 @@ static int mpc8xxx_wdt_probe(struct platform_device *ofdev) return -ENODEV; } + res = platform_get_resource(ofdev, IORESOURCE_MEM, 1); + ddata->rsr = devm_ioremap_resource(dev, res); + if (IS_ERR(ddata->rsr)) { + dev_info(dev, "Could not map reset status register"); Please, no such message. It would start to show up everywhere unless devicetree files are updated, which likely won't happen. Then we get bogged down by people asking where this message suddenly comes from. ok + } else { + u32 rsr_v = in_be32(ddata->rsr); + bool status = rsr_v & wdt_type->rsr_mask; + + ddata->wdd.bootstatus = status ? WDIOF_CARDRESET : 0; +/* clear reset status bits related to watchdog time */ + out_be32(ddata->rsr, wdt_type->rsr_mask); + + dev_info(dev, "Last boot was %s by watchdog (RSR = 0x%8.8x)\n", +status ? "caused" : "not caused", rsr_v); The hex value of RSR may be interesting for developers, but not for users. Please drop. Also, "caused" is redundant. Add it to the base string and add "not " when needed. Ok, I did it in v2, allthought I find the code less readable. + } + spin_lock_init(>lock); ddata->wdd.info = _wdt_info, @@ -216,6 +235,7 @@ static const struct of_device_id mpc8xxx_wdt_match[] = { .compatible = "mpc83xx_wdt", .data = &(struct mpc8xxx_wdt_type) { .prescaler = 0x1, + .rsr_mask = BIT(3), /* RSR Bit 28 */ The comment is quite useless. How does BIT(3) match RSR bit 28 ? I am sure it is because the HW manual counts bits the other way, but here it is just confusing and thus doesn't add value unless you provide additional context. Ok, put the BIT names instead. Thanks for the review Christophe }, }, { @@ -223,6 +243,7 @@ static const struct of_device_id mpc8xxx_wdt_match[] = { .data = &(struct mpc8xxx_wdt_type) { .prescaler = 0x1, .hw_enabled = true, + .rsr_mask = BIT(20), /* RSTRSCR Bit 11 */ }, }, { @@ -230,6 +251,7 @@ static const struct of_device_id mpc8xxx_wdt_match[] = { .data = &(struct mpc8xxx_wdt_type) { .prescaler = 0x800, .hw_enabled = true, + .rsr_mask = BIT(28), /* RSR Bit 3 */
[PATCH v2 3/3] dt-bindings: watchdog: add mpc8xxx-wdt support
Add description of DT bindings for mpc8xxx-wdt driver which handles the CPU watchdog timer on the mpc83xx, mpc86xx and mpc8xx. Signed-off-by: Christophe Leroy --- .../devicetree/bindings/watchdog/mpc8xxx-wdt.txt | 25 ++ 1 file changed, 25 insertions(+) create mode 100644 Documentation/devicetree/bindings/watchdog/mpc8xxx-wdt.txt diff --git a/Documentation/devicetree/bindings/watchdog/mpc8xxx-wdt.txt b/Documentation/devicetree/bindings/watchdog/mpc8xxx-wdt.txt new file mode 100644 index ..1d99e1e4d306 --- /dev/null +++ b/Documentation/devicetree/bindings/watchdog/mpc8xxx-wdt.txt @@ -0,0 +1,25 @@ +* Freescale mpc8xxx watchdog driver (For 83xx, 86xx and 8xx) + +Required properties: +- compatible: Shall contain one of the following: + "mpc83xx_wdt" for an mpc83xx + "fsl,mpc8610-wdt" for an mpc86xx + "fsl,mpc823-wdt" for an mpc8xx +- reg: base physical address and length of the area hosting the + watchdog registers. + On the 83xx, "Watchdog Timer Registers" area: <0x200 0x100> + On the 86xx, "Watchdog Timer Registers" area: <0xe4000 0x100> + On the 8xx, "General System Interface Unit" area: <0x0 0x10> + +Optional properties: +- reg: additionnal physical address and length (4) of location of the + Reset Status Register (called RSTRSCR on the mpc86xx) + On the 83xx, it is located at offset 0x910 + On the 86xx, it is located at offset 0xe0094 + On the 8xx, it is located at offset 0x288 + +Example: + WDT: watchdog@0 { + compatible = "fsl,mpc823-wdt"; + reg = <0x0 0x10 0x288 0x4>; + }; -- 2.13.3
[PATCH v2 2/3] watchdog: mpc8xxx: provide boot status
mpc8xxx watchdog driver supports the following platforms: - mpc8xx - mpc83xx - mpc86xx Those three platforms have a 32 bits register which provides the reason of the last boot, including whether it was caused by the watchdog. mpc8xx: Register RSR, bit SWRS (bit 3) mpc83xx: Register RSR, bit SWRS (bit 28) mpc86xx: Register RSTRSCR, bit WDT_RR (bit 11) This patch maps the register as defined in the device tree and updates wdt.bootstatus based on the value of the watchdog related bit. Then the information can be retrieved via the WDIOC_GETBOOTSTATUS ioctl. Hereunder is an example of devicetree for mpc8xx, the Reset Status Register being at offset 0x288: WDT: watchdog@0 { compatible = "fsl,mpc823-wdt"; reg = <0x0 0x10 0x288 0x4>; }; On the mpc83xx, RSR is at offset 0x910 On the mpc86xx, RSTRSCR is at offset 0xe0094 Suggested-by: Radu Rendec Tested-by: Christophe Leroy # On mpc885 Signed-off-by: Christophe Leroy --- drivers/watchdog/mpc8xxx_wdt.c | 20 1 file changed, 20 insertions(+) diff --git a/drivers/watchdog/mpc8xxx_wdt.c b/drivers/watchdog/mpc8xxx_wdt.c index 1dcf5f10cdd9..4a4700458b17 100644 --- a/drivers/watchdog/mpc8xxx_wdt.c +++ b/drivers/watchdog/mpc8xxx_wdt.c @@ -47,6 +47,7 @@ struct mpc8xxx_wdt { struct mpc8xxx_wdt_type { int prescaler; bool hw_enabled; + u32 rsr_mask; }; struct mpc8xxx_wdt_ddata { @@ -136,6 +137,7 @@ static int mpc8xxx_wdt_probe(struct platform_device *ofdev) u32 freq = fsl_get_sys_freq(); bool enabled; struct device *dev = >dev; + u32 __iomem *rsr = NULL; wdt_type = of_device_get_match_data(dev); if (!wdt_type) @@ -159,6 +161,21 @@ static int mpc8xxx_wdt_probe(struct platform_device *ofdev) return -ENODEV; } + res = platform_get_resource(ofdev, IORESOURCE_MEM, 1); + if (res) + rsr = ioremap(res->start, resource_size(res)); + if (rsr) { + bool status = in_be32(rsr) & wdt_type->rsr_mask; + + ddata->wdd.bootstatus = status ? WDIOF_CARDRESET : 0; +/* clear reset status bits related to watchdog timer */ + out_be32(rsr, wdt_type->rsr_mask); + iounmap(rsr); + + dev_info(dev, "Last boot was %scaused by watchdog\n", +status ? "" : "not "); + } + spin_lock_init(>lock); ddata->wdd.info = _wdt_info, @@ -216,6 +233,7 @@ static const struct of_device_id mpc8xxx_wdt_match[] = { .compatible = "mpc83xx_wdt", .data = &(struct mpc8xxx_wdt_type) { .prescaler = 0x1, + .rsr_mask = BIT(3), /* RSR Bit SWRS */ }, }, { @@ -223,6 +241,7 @@ static const struct of_device_id mpc8xxx_wdt_match[] = { .data = &(struct mpc8xxx_wdt_type) { .prescaler = 0x1, .hw_enabled = true, + .rsr_mask = BIT(20), /* RSTRSCR Bit WDT_RR */ }, }, { @@ -230,6 +249,7 @@ static const struct of_device_id mpc8xxx_wdt_match[] = { .data = &(struct mpc8xxx_wdt_type) { .prescaler = 0x800, .hw_enabled = true, + .rsr_mask = BIT(28), /* RSR Bit SWRS */ }, }, {}, -- 2.13.3
[PATCH v2 1/3] watchdog: mpc8xxx: use dev_xxxx() instead of pr_xxxx()
mpc8xxx watchdog driver is a platform device drivers, it is therefore possible to use dev_xxx() messaging rather than pr_xxx() Signed-off-by: Christophe Leroy --- drivers/watchdog/mpc8xxx_wdt.c | 24 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/drivers/watchdog/mpc8xxx_wdt.c b/drivers/watchdog/mpc8xxx_wdt.c index aca2d6323f8a..1dcf5f10cdd9 100644 --- a/drivers/watchdog/mpc8xxx_wdt.c +++ b/drivers/watchdog/mpc8xxx_wdt.c @@ -17,8 +17,6 @@ * option) any later version. */ -#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt - #include #include #include @@ -137,26 +135,27 @@ static int mpc8xxx_wdt_probe(struct platform_device *ofdev) struct mpc8xxx_wdt_ddata *ddata; u32 freq = fsl_get_sys_freq(); bool enabled; + struct device *dev = >dev; - wdt_type = of_device_get_match_data(>dev); + wdt_type = of_device_get_match_data(dev); if (!wdt_type) return -EINVAL; if (!freq || freq == -1) return -EINVAL; - ddata = devm_kzalloc(>dev, sizeof(*ddata), GFP_KERNEL); + ddata = devm_kzalloc(dev, sizeof(*ddata), GFP_KERNEL); if (!ddata) return -ENOMEM; res = platform_get_resource(ofdev, IORESOURCE_MEM, 0); - ddata->base = devm_ioremap_resource(>dev, res); + ddata->base = devm_ioremap_resource(dev, res); if (IS_ERR(ddata->base)) return PTR_ERR(ddata->base); enabled = in_be32(>base->swcrr) & SWCRR_SWEN; if (!enabled && wdt_type->hw_enabled) { - pr_info("could not be enabled in software\n"); + dev_info(dev, "could not be enabled in software\n"); return -ENODEV; } @@ -166,7 +165,7 @@ static int mpc8xxx_wdt_probe(struct platform_device *ofdev) ddata->wdd.ops = _wdt_ops, ddata->wdd.timeout = WATCHDOG_TIMEOUT; - watchdog_init_timeout(>wdd, timeout, >dev); + watchdog_init_timeout(>wdd, timeout, dev); watchdog_set_nowayout(>wdd, nowayout); @@ -189,12 +188,13 @@ static int mpc8xxx_wdt_probe(struct platform_device *ofdev) ret = watchdog_register_device(>wdd); if (ret) { - pr_err("cannot register watchdog device (err=%d)\n", ret); + dev_err(dev, "cannot register watchdog device (err=%d)\n", ret); return ret; } - pr_info("WDT driver for MPC8xxx initialized. mode:%s timeout=%d sec\n", - reset ? "reset" : "interrupt", ddata->wdd.timeout); + dev_info(dev, +"WDT driver for MPC8xxx initialized. mode:%s timeout=%d sec\n", +reset ? "reset" : "interrupt", ddata->wdd.timeout); platform_set_drvdata(ofdev, ddata); return 0; @@ -204,8 +204,8 @@ static int mpc8xxx_wdt_remove(struct platform_device *ofdev) { struct mpc8xxx_wdt_ddata *ddata = platform_get_drvdata(ofdev); - pr_crit("Watchdog removed, expect the %s soon!\n", - reset ? "reset" : "machine check exception"); + dev_crit(>dev, "Watchdog removed, expect the %s soon!\n", +reset ? "reset" : "machine check exception"); watchdog_unregister_device(>wdd); return 0; -- 2.13.3
[PATCH] kdb: use correct pointer when 'btc' calls 'btt'
On a powerpc 8xx, 'btc' fails as follows: Entering kdb (current=0x(ptrval), pid 282) due to Keyboard Entry kdb> btc btc: cpu status: Currently on cpu 0 Available cpus: 0 kdb_getarea: Bad address 0x0 when booting the kernel with 'debug_boot_weak_hash', it fails as well Entering kdb (current=0xba99ad80, pid 284) due to Keyboard Entry kdb> btc btc: cpu status: Currently on cpu 0 Available cpus: 0 kdb_getarea: Bad address 0xba99ad80 On other platforms, Oopses have been observed too, see https://github.com/linuxppc/linux/issues/139 This is due to btc calling 'btt' with %p pointer as an argument. This patch replaces %p by %px to get the real pointer value as expected by 'btt' Signed-off-by: Christophe Leroy Cc: # 4.15+ --- kernel/debug/kdb/kdb_bt.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/kernel/debug/kdb/kdb_bt.c b/kernel/debug/kdb/kdb_bt.c index 6ad4a9fcbd6f..7921ae4fca8d 100644 --- a/kernel/debug/kdb/kdb_bt.c +++ b/kernel/debug/kdb/kdb_bt.c @@ -179,14 +179,14 @@ kdb_bt(int argc, const char **argv) kdb_printf("no process for cpu %ld\n", cpu); return 0; } - sprintf(buf, "btt 0x%p\n", KDB_TSK(cpu)); + sprintf(buf, "btt 0x%px\n", KDB_TSK(cpu)); kdb_parse(buf); return 0; } kdb_printf("btc: cpu status: "); kdb_parse("cpu\n"); for_each_online_cpu(cpu) { - sprintf(buf, "btt 0x%p\n", KDB_TSK(cpu)); + sprintf(buf, "btt 0x%px\n", KDB_TSK(cpu)); kdb_parse(buf); touch_nmi_watchdog(); } -- 2.13.3
[PATCH 30/30] docs/boot-time-mm: remove bootmem documentation
Signed-off-by: Mike Rapoport --- Documentation/core-api/boot-time-mm.rst | 71 + 1 file changed, 10 insertions(+), 61 deletions(-) diff --git a/Documentation/core-api/boot-time-mm.rst b/Documentation/core-api/boot-time-mm.rst index 03cb164..e5ec9f1 100644 --- a/Documentation/core-api/boot-time-mm.rst +++ b/Documentation/core-api/boot-time-mm.rst @@ -5,54 +5,23 @@ Boot time memory management Early system initialization cannot use "normal" memory management simply because it is not set up yet. But there is still need to allocate memory for various data structures, for instance for the -physical page allocator. To address this, a specialized allocator -called the :ref:`Boot Memory Allocator `, or bootmem, was -introduced. Several years later PowerPC developers added a "Logical -Memory Blocks" allocator, which was later adopted by other -architectures and renamed to :ref:`memblock `. There is also -a compatibility layer called `nobootmem` that translates bootmem -allocation interfaces to memblock calls. +physical page allocator. -The selection of the early allocator is done using -``CONFIG_NO_BOOTMEM`` and ``CONFIG_HAVE_MEMBLOCK`` kernel -configuration options. These options are enabled or disabled -statically by the architectures' Kconfig files. - -* Architectures that rely only on bootmem select - ``CONFIG_NO_BOOTMEM=n && CONFIG_HAVE_MEMBLOCK=n``. -* The users of memblock with the nobootmem compatibility layer set - ``CONFIG_NO_BOOTMEM=y && CONFIG_HAVE_MEMBLOCK=y``. -* And for those that use both memblock and bootmem the configuration - includes ``CONFIG_NO_BOOTMEM=n && CONFIG_HAVE_MEMBLOCK=y``. - -Whichever allocator is used, it is the responsibility of the -architecture specific initialization to set it up in -:c:func:`setup_arch` and tear it down in :c:func:`mem_init` functions. +A specialized allocator called ``memblock`` performs the +boot time memory management. The architecture specific initialization +must set it up in :c:func:`setup_arch` and tear it down in +:c:func:`mem_init` functions. Once the early memory management is available it offers a variety of functions and macros for memory allocations. The allocation request may be directed to the first (and probably the only) node or to a particular node in a NUMA system. There are API variants that panic -when an allocation fails and those that don't. And more recent and -advanced memblock even allows controlling its own behaviour. - -.. _bootmem: - -Bootmem -=== - -(mostly stolen from Mel Gorman's "Understanding the Linux Virtual -Memory Manager" `book`_) - -.. _book: https://www.kernel.org/doc/gorman/ - -.. kernel-doc:: mm/bootmem.c - :doc: bootmem overview +when an allocation fails and those that don't. -.. _memblock: +Memblock also offers a variety of APIs that control its own behaviour. -Memblock - +Memblock Overview += .. kernel-doc:: mm/memblock.c :doc: memblock overview @@ -61,26 +30,6 @@ Memblock Functions and structures -Common API --- - -The functions that are described in this section are available -regardless of what early memory manager is enabled. - -.. kernel-doc:: mm/nobootmem.c - -Bootmem specific API - - -These interfaces available only with bootmem, i.e when ``CONFIG_NO_BOOTMEM=n`` - -.. kernel-doc:: include/linux/bootmem.h -.. kernel-doc:: mm/bootmem.c - :nodocs: - -Memblock specific API -- - Here is the description of memblock data structures, functions and macros. Some of them are actually internal, but since they are documented it would be silly to omit them. Besides, reading the @@ -89,4 +38,4 @@ really happens under the hood. .. kernel-doc:: include/linux/memblock.h .. kernel-doc:: mm/memblock.c - :nodocs: + :functions: -- 2.7.4
[PATCH 29/30] mm: remove include/linux/bootmem.h
Move remaining definitions and declarations from include/linux/bootmem.h into include/linux/memblock.h and remove the redundant header. The includes were replaced with the semantic patch below and then semi-automated removal of duplicated '#include @@ @@ - #include + #include Signed-off-by: Mike Rapoport --- arch/alpha/kernel/core_cia.c| 2 +- arch/alpha/kernel/core_irongate.c | 1 - arch/alpha/kernel/core_marvel.c | 2 +- arch/alpha/kernel/core_titan.c | 2 +- arch/alpha/kernel/core_tsunami.c| 2 +- arch/alpha/kernel/pci-noop.c| 2 +- arch/alpha/kernel/pci.c | 2 +- arch/alpha/kernel/pci_iommu.c | 2 +- arch/alpha/kernel/setup.c | 1 - arch/alpha/kernel/sys_nautilus.c| 2 +- arch/alpha/mm/init.c| 2 +- arch/alpha/mm/numa.c| 1 - arch/arc/kernel/unwind.c| 2 +- arch/arc/mm/highmem.c | 2 +- arch/arc/mm/init.c | 1 - arch/arm/kernel/devtree.c | 1 - arch/arm/kernel/setup.c | 1 - arch/arm/mach-omap2/omap_hwmod.c| 2 +- arch/arm/mm/dma-mapping.c | 1 - arch/arm/mm/init.c | 1 - arch/arm/xen/mm.c | 1 - arch/arm/xen/p2m.c | 2 +- arch/arm64/kernel/acpi.c| 1 - arch/arm64/kernel/acpi_numa.c | 1 - arch/arm64/kernel/setup.c | 1 - arch/arm64/mm/dma-mapping.c | 2 +- arch/arm64/mm/init.c| 1 - arch/arm64/mm/kasan_init.c | 1 - arch/arm64/mm/numa.c| 1 - arch/c6x/kernel/setup.c | 1 - arch/c6x/mm/init.c | 2 +- arch/h8300/kernel/setup.c | 1 - arch/h8300/mm/init.c| 2 +- arch/hexagon/kernel/dma.c | 2 +- arch/hexagon/kernel/setup.c | 2 +- arch/hexagon/mm/init.c | 1 - arch/ia64/kernel/crash.c| 2 +- arch/ia64/kernel/efi.c | 2 +- arch/ia64/kernel/ia64_ksyms.c | 2 +- arch/ia64/kernel/iosapic.c | 2 +- arch/ia64/kernel/mca.c | 2 +- arch/ia64/kernel/mca_drv.c | 2 +- arch/ia64/kernel/setup.c| 1 - arch/ia64/kernel/smpboot.c | 2 +- arch/ia64/kernel/topology.c | 2 +- arch/ia64/kernel/unwind.c | 2 +- arch/ia64/mm/contig.c | 1 - arch/ia64/mm/discontig.c| 1 - arch/ia64/mm/init.c | 1 - arch/ia64/mm/numa.c | 2 +- arch/ia64/mm/tlb.c | 2 +- arch/ia64/pci/pci.c | 2 +- arch/ia64/sn/kernel/bte.c | 2 +- arch/ia64/sn/kernel/io_common.c | 2 +- arch/ia64/sn/kernel/setup.c | 2 +- arch/m68k/atari/stram.c | 2 +- arch/m68k/coldfire/m54xx.c | 2 +- arch/m68k/kernel/setup_mm.c | 1 - arch/m68k/kernel/setup_no.c | 1 - arch/m68k/kernel/uboot.c| 2 +- arch/m68k/mm/init.c | 2 +- arch/m68k/mm/mcfmmu.c | 1 - arch/m68k/mm/motorola.c | 1 - arch/m68k/mm/sun3mmu.c | 2 +- arch/m68k/sun3/config.c | 2 +- arch/m68k/sun3/dvma.c | 2 +- arch/m68k/sun3/mmu_emu.c| 2 +- arch/m68k/sun3/sun3dvma.c | 2 +- arch/m68k/sun3x/dvma.c | 2 +- arch/microblaze/mm/consistent.c | 2 +- arch/microblaze/mm/init.c | 3 +- arch/microblaze/pci/pci-common.c| 2 +- arch/mips/ar7/memory.c | 2 +- arch/mips/ath79/setup.c | 2 +- arch/mips/bcm63xx/prom.c| 2 +- arch/mips/bcm63xx/setup.c | 2 +- arch/mips/bmips/setup.c | 2 +- arch/mips/cavium-octeon/dma-octeon.c| 2 +- arch/mips/dec/prom/memory.c | 2 +- arch/mips/emma/common/prom.c| 2 +- arch/mips/fw/arc/memory.c | 2 +- arch/mips/jazz/jazzdma.c| 2 +- arch/mips/kernel/crash.c| 2 +- arch/mips/kernel/crash_dump.c | 2 +- arch/mips/kernel/prom.c | 2 +- arch/mips/kernel/setup.c| 1 - arch/mips/kernel/traps.c| 1 -
[PATCH 28/30] memblock: replace BOOTMEM_ALLOC_* with MEMBLOCK variants
Drop BOOTMEM_ALLOC_ACCESSIBLE and BOOTMEM_ALLOC_ANYWHERE in favor of identical MEMBLOCK definitions. Signed-off-by: Mike Rapoport Acked-by: Michal Hocko --- arch/ia64/mm/discontig.c | 2 +- arch/powerpc/kernel/setup_64.c | 2 +- arch/sparc/kernel/smp_64.c | 2 +- arch/x86/kernel/setup_percpu.c | 2 +- arch/x86/mm/kasan_init_64.c| 4 ++-- mm/hugetlb.c | 3 ++- mm/kasan/kasan_init.c | 2 +- mm/memblock.c | 8 mm/page_ext.c | 2 +- mm/sparse-vmemmap.c| 3 ++- mm/sparse.c| 5 +++-- 11 files changed, 19 insertions(+), 16 deletions(-) diff --git a/arch/ia64/mm/discontig.c b/arch/ia64/mm/discontig.c index 918dda9..70609f8 100644 --- a/arch/ia64/mm/discontig.c +++ b/arch/ia64/mm/discontig.c @@ -453,7 +453,7 @@ static void __init *memory_less_node_alloc(int nid, unsigned long pernodesize) ptr = memblock_alloc_try_nid(pernodesize, PERCPU_PAGE_SIZE, __pa(MAX_DMA_ADDRESS), -BOOTMEM_ALLOC_ACCESSIBLE, +MEMBLOCK_ALLOC_ACCESSIBLE, bestnode); return ptr; diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c index e564b27..b3e70cc 100644 --- a/arch/powerpc/kernel/setup_64.c +++ b/arch/powerpc/kernel/setup_64.c @@ -758,7 +758,7 @@ void __init emergency_stack_init(void) static void * __init pcpu_fc_alloc(unsigned int cpu, size_t size, size_t align) { return memblock_alloc_try_nid(size, align, __pa(MAX_DMA_ADDRESS), - BOOTMEM_ALLOC_ACCESSIBLE, + MEMBLOCK_ALLOC_ACCESSIBLE, early_cpu_to_node(cpu)); } diff --git a/arch/sparc/kernel/smp_64.c b/arch/sparc/kernel/smp_64.c index a087a6a..6cc80d0 100644 --- a/arch/sparc/kernel/smp_64.c +++ b/arch/sparc/kernel/smp_64.c @@ -1595,7 +1595,7 @@ static void * __init pcpu_alloc_bootmem(unsigned int cpu, size_t size, cpu, size, __pa(ptr)); } else { ptr = memblock_alloc_try_nid(size, align, goal, -BOOTMEM_ALLOC_ACCESSIBLE, node); +MEMBLOCK_ALLOC_ACCESSIBLE, node); pr_debug("per cpu data for cpu%d %lu bytes on node%d at " "%016lx\n", cpu, size, node, __pa(ptr)); } diff --git a/arch/x86/kernel/setup_percpu.c b/arch/x86/kernel/setup_percpu.c index a006f1b..483412f 100644 --- a/arch/x86/kernel/setup_percpu.c +++ b/arch/x86/kernel/setup_percpu.c @@ -114,7 +114,7 @@ static void * __init pcpu_alloc_bootmem(unsigned int cpu, unsigned long size, cpu, size, __pa(ptr)); } else { ptr = memblock_alloc_try_nid_nopanic(size, align, goal, -BOOTMEM_ALLOC_ACCESSIBLE, +MEMBLOCK_ALLOC_ACCESSIBLE, node); pr_debug("per cpu data for cpu%d %lu bytes on node%d at %016lx\n", diff --git a/arch/x86/mm/kasan_init_64.c b/arch/x86/mm/kasan_init_64.c index 77b857c..8f87499 100644 --- a/arch/x86/mm/kasan_init_64.c +++ b/arch/x86/mm/kasan_init_64.c @@ -29,10 +29,10 @@ static __init void *early_alloc(size_t size, int nid, bool panic) { if (panic) return memblock_alloc_try_nid(size, size, - __pa(MAX_DMA_ADDRESS), BOOTMEM_ALLOC_ACCESSIBLE, nid); + __pa(MAX_DMA_ADDRESS), MEMBLOCK_ALLOC_ACCESSIBLE, nid); else return memblock_alloc_try_nid_nopanic(size, size, - __pa(MAX_DMA_ADDRESS), BOOTMEM_ALLOC_ACCESSIBLE, nid); + __pa(MAX_DMA_ADDRESS), MEMBLOCK_ALLOC_ACCESSIBLE, nid); } static void __init kasan_populate_pmd(pmd_t *pmd, unsigned long addr, diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 3b63370..67629dc 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -16,6 +16,7 @@ #include #include #include +#include #include #include #include @@ -2102,7 +2103,7 @@ int __alloc_bootmem_huge_page(struct hstate *h) addr = memblock_alloc_try_nid_raw( huge_page_size(h), huge_page_size(h), - 0, BOOTMEM_ALLOC_ACCESSIBLE, node); + 0, MEMBLOCK_ALLOC_ACCESSIBLE, node); if (addr) { /* * Use the beginning of the huge page to store the diff --git a/mm/kasan/kasan_init.c b/mm/kasan/kasan_init.c index 24d734b..785a970 100644 --- a/mm/kasan/kasan_init.c +++ b/mm/kasan/kasan_init.c @@ -84,7 +84,7 @@ static inline bool kasan_zero_page_entry(pte_t pte) static __init void *early_alloc(size_t size, int node) {
[PATCH 27/30] mm: remove nobootmem
Move a few remaining functions from nobootmem.c to memblock.c and remove nobootmem Signed-off-by: Mike Rapoport Acked-by: Michal Hocko --- mm/Makefile| 1 - mm/memblock.c | 104 ++ mm/nobootmem.c | 128 - 3 files changed, 104 insertions(+), 129 deletions(-) delete mode 100644 mm/nobootmem.c diff --git a/mm/Makefile b/mm/Makefile index ca3c844..d210cc9 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -42,7 +42,6 @@ obj-y := filemap.o mempool.o oom_kill.o fadvise.o \ debug.o $(mmu-y) obj-y += init-mm.o -obj-y += nobootmem.o obj-y += memblock.o ifdef CONFIG_MMU diff --git a/mm/memblock.c b/mm/memblock.c index a2cd61d..4591f38 100644 --- a/mm/memblock.c +++ b/mm/memblock.c @@ -82,6 +82,16 @@ * initialization compltes. */ +#ifndef CONFIG_NEED_MULTIPLE_NODES +struct pglist_data __refdata contig_page_data; +EXPORT_SYMBOL(contig_page_data); +#endif + +unsigned long max_low_pfn; +unsigned long min_low_pfn; +unsigned long max_pfn; +unsigned long long max_possible_pfn; + static struct memblock_region memblock_memory_init_regions[INIT_MEMBLOCK_REGIONS] __initdata_memblock; static struct memblock_region memblock_reserved_init_regions[INIT_MEMBLOCK_REGIONS] __initdata_memblock; #ifdef CONFIG_HAVE_MEMBLOCK_PHYS_MAP @@ -1929,6 +1939,100 @@ static int __init early_memblock(char *p) } early_param("memblock", early_memblock); +static void __init __free_pages_memory(unsigned long start, unsigned long end) +{ + int order; + + while (start < end) { + order = min(MAX_ORDER - 1UL, __ffs(start)); + + while (start + (1UL << order) > end) + order--; + + memblock_free_pages(pfn_to_page(start), start, order); + + start += (1UL << order); + } +} + +static unsigned long __init __free_memory_core(phys_addr_t start, +phys_addr_t end) +{ + unsigned long start_pfn = PFN_UP(start); + unsigned long end_pfn = min_t(unsigned long, + PFN_DOWN(end), max_low_pfn); + + if (start_pfn >= end_pfn) + return 0; + + __free_pages_memory(start_pfn, end_pfn); + + return end_pfn - start_pfn; +} + +static unsigned long __init free_low_memory_core_early(void) +{ + unsigned long count = 0; + phys_addr_t start, end; + u64 i; + + memblock_clear_hotplug(0, -1); + + for_each_reserved_mem_region(i, , ) + reserve_bootmem_region(start, end); + + /* +* We need to use NUMA_NO_NODE instead of NODE_DATA(0)->node_id +* because in some case like Node0 doesn't have RAM installed +* low ram will be on Node1 +*/ + for_each_free_mem_range(i, NUMA_NO_NODE, MEMBLOCK_NONE, , , + NULL) + count += __free_memory_core(start, end); + + return count; +} + +static int reset_managed_pages_done __initdata; + +void reset_node_managed_pages(pg_data_t *pgdat) +{ + struct zone *z; + + for (z = pgdat->node_zones; z < pgdat->node_zones + MAX_NR_ZONES; z++) + z->managed_pages = 0; +} + +void __init reset_all_zones_managed_pages(void) +{ + struct pglist_data *pgdat; + + if (reset_managed_pages_done) + return; + + for_each_online_pgdat(pgdat) + reset_node_managed_pages(pgdat); + + reset_managed_pages_done = 1; +} + +/** + * memblock_free_all - release free pages to the buddy allocator + * + * Return: the number of pages actually released. + */ +unsigned long __init memblock_free_all(void) +{ + unsigned long pages; + + reset_all_zones_managed_pages(); + + pages = free_low_memory_core_early(); + totalram_pages += pages; + + return pages; +} + #if defined(CONFIG_DEBUG_FS) && !defined(CONFIG_ARCH_DISCARD_MEMBLOCK) static int memblock_debug_show(struct seq_file *m, void *private) diff --git a/mm/nobootmem.c b/mm/nobootmem.c deleted file mode 100644 index 9608bc5..000 --- a/mm/nobootmem.c +++ /dev/null @@ -1,128 +0,0 @@ -// SPDX-License-Identifier: GPL-2.0 -/* - * bootmem - A boot-time physical memory allocator and configurator - * - * Copyright (C) 1999 Ingo Molnar - *1999 Kanoj Sarcar, SGI - *2008 Johannes Weiner - * - * Access to this subsystem has to be serialized externally (which is true - * for the boot process anyway). - */ -#include -#include -#include -#include -#include -#include -#include -#include - -#include -#include - -#include "internal.h" - -#ifndef CONFIG_NEED_MULTIPLE_NODES -struct pglist_data __refdata contig_page_data; -EXPORT_SYMBOL(contig_page_data); -#endif - -unsigned long max_low_pfn; -unsigned long min_low_pfn; -unsigned long max_pfn; -unsigned long long max_possible_pfn; - -static void __init
[PATCH 26/30] memblock: rename __free_pages_bootmem to memblock_free_pages
The conversion is done using sed -i 's@__free_pages_bootmem@memblock_free_pages@' \ $(git grep -l __free_pages_bootmem) Signed-off-by: Mike Rapoport Acked-by: Michal Hocko --- mm/internal.h | 2 +- mm/memblock.c | 2 +- mm/nobootmem.c | 2 +- mm/page_alloc.c | 2 +- 4 files changed, 4 insertions(+), 4 deletions(-) diff --git a/mm/internal.h b/mm/internal.h index 87256ae..291eb2b 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -161,7 +161,7 @@ static inline struct page *pageblock_pfn_to_page(unsigned long start_pfn, } extern int __isolate_free_page(struct page *page, unsigned int order); -extern void __free_pages_bootmem(struct page *page, unsigned long pfn, +extern void memblock_free_pages(struct page *page, unsigned long pfn, unsigned int order); extern void prep_compound_page(struct page *page, unsigned int order); extern void post_alloc_hook(struct page *page, unsigned int order, diff --git a/mm/memblock.c b/mm/memblock.c index 1534edb..a2cd61d 100644 --- a/mm/memblock.c +++ b/mm/memblock.c @@ -1615,7 +1615,7 @@ void __init __memblock_free_late(phys_addr_t base, phys_addr_t size) end = PFN_DOWN(base + size); for (; cursor < end; cursor++) { - __free_pages_bootmem(pfn_to_page(cursor), cursor, 0); + memblock_free_pages(pfn_to_page(cursor), cursor, 0); totalram_pages++; } } diff --git a/mm/nobootmem.c b/mm/nobootmem.c index bb64b09..9608bc5 100644 --- a/mm/nobootmem.c +++ b/mm/nobootmem.c @@ -43,7 +43,7 @@ static void __init __free_pages_memory(unsigned long start, unsigned long end) while (start + (1UL << order) > end) order--; - __free_pages_bootmem(pfn_to_page(start), start, order); + memblock_free_pages(pfn_to_page(start), start, order); start += (1UL << order); } diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 13e394c..f4a8bc8 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1334,7 +1334,7 @@ meminit_pfn_in_nid(unsigned long pfn, int node, #endif -void __init __free_pages_bootmem(struct page *page, unsigned long pfn, +void __init memblock_free_pages(struct page *page, unsigned long pfn, unsigned int order) { if (early_page_uninitialised(pfn)) -- 2.7.4
[PATCH 25/30] memblock: rename free_all_bootmem to memblock_free_all
The conversion is done using sed -i 's@free_all_bootmem@memblock_free_all@' \ $(git grep -l free_all_bootmem) Signed-off-by: Mike Rapoport Acked-by: Michal Hocko --- arch/alpha/mm/init.c | 2 +- arch/arc/mm/init.c | 2 +- arch/arm/mm/init.c | 2 +- arch/arm64/mm/init.c | 2 +- arch/c6x/mm/init.c | 2 +- arch/h8300/mm/init.c | 2 +- arch/hexagon/mm/init.c | 2 +- arch/ia64/mm/init.c| 2 +- arch/m68k/mm/init.c| 2 +- arch/microblaze/mm/init.c | 2 +- arch/mips/loongson64/loongson-3/numa.c | 2 +- arch/mips/mm/init.c| 2 +- arch/mips/sgi-ip27/ip27-memory.c | 2 +- arch/nds32/mm/init.c | 2 +- arch/nios2/mm/init.c | 2 +- arch/openrisc/mm/init.c| 2 +- arch/parisc/mm/init.c | 2 +- arch/powerpc/mm/mem.c | 2 +- arch/riscv/mm/init.c | 2 +- arch/s390/mm/init.c| 2 +- arch/sh/mm/init.c | 2 +- arch/sparc/mm/init_32.c| 2 +- arch/sparc/mm/init_64.c| 4 ++-- arch/um/kernel/mem.c | 2 +- arch/unicore32/mm/init.c | 2 +- arch/x86/mm/highmem_32.c | 2 +- arch/x86/mm/init_32.c | 4 ++-- arch/x86/mm/init_64.c | 4 ++-- arch/x86/xen/mmu_pv.c | 2 +- arch/xtensa/mm/init.c | 2 +- include/linux/bootmem.h| 2 +- mm/memblock.c | 2 +- mm/nobootmem.c | 4 ++-- mm/page_alloc.c| 2 +- mm/page_poison.c | 2 +- 35 files changed, 39 insertions(+), 39 deletions(-) diff --git a/arch/alpha/mm/init.c b/arch/alpha/mm/init.c index 9d74520..853d153 100644 --- a/arch/alpha/mm/init.c +++ b/arch/alpha/mm/init.c @@ -282,7 +282,7 @@ mem_init(void) { set_max_mapnr(max_low_pfn); high_memory = (void *) __va(max_low_pfn * PAGE_SIZE); - free_all_bootmem(); + memblock_free_all(); mem_init_print_info(NULL); } diff --git a/arch/arc/mm/init.c b/arch/arc/mm/init.c index ba14506..0f29c65 100644 --- a/arch/arc/mm/init.c +++ b/arch/arc/mm/init.c @@ -218,7 +218,7 @@ void __init mem_init(void) free_highmem_page(pfn_to_page(tmp)); #endif - free_all_bootmem(); + memblock_free_all(); mem_init_print_info(NULL); } diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c index 0cc8e04..d421a10 100644 --- a/arch/arm/mm/init.c +++ b/arch/arm/mm/init.c @@ -508,7 +508,7 @@ void __init mem_init(void) /* this will put all unused low memory onto the freelists */ free_unused_memmap(); - free_all_bootmem(); + memblock_free_all(); #ifdef CONFIG_SA /* now that our DMA memory is actually so designated, we can free it */ diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c index e335452..ae21849 100644 --- a/arch/arm64/mm/init.c +++ b/arch/arm64/mm/init.c @@ -601,7 +601,7 @@ void __init mem_init(void) free_unused_memmap(); #endif /* this will put all unused low memory onto the freelists */ - free_all_bootmem(); + memblock_free_all(); kexec_reserve_crashkres_pages(); diff --git a/arch/c6x/mm/init.c b/arch/c6x/mm/init.c index dc369ad..3383df8 100644 --- a/arch/c6x/mm/init.c +++ b/arch/c6x/mm/init.c @@ -62,7 +62,7 @@ void __init mem_init(void) high_memory = (void *)(memory_end & PAGE_MASK); /* this will put all memory onto the freelists */ - free_all_bootmem(); + memblock_free_all(); mem_init_print_info(NULL); } diff --git a/arch/h8300/mm/init.c b/arch/h8300/mm/init.c index 5d31ac9..f2bf448 100644 --- a/arch/h8300/mm/init.c +++ b/arch/h8300/mm/init.c @@ -96,7 +96,7 @@ void __init mem_init(void) max_mapnr = MAP_NR(high_memory); /* this will put all low memory onto the freelists */ - free_all_bootmem(); + memblock_free_all(); mem_init_print_info(NULL); } diff --git a/arch/hexagon/mm/init.c b/arch/hexagon/mm/init.c index d789b9c..88643fa 100644 --- a/arch/hexagon/mm/init.c +++ b/arch/hexagon/mm/init.c @@ -68,7 +68,7 @@ unsigned long long kmap_generation; void __init mem_init(void) { /* No idea where this is actually declared. Seems to evade LXR. */ - free_all_bootmem(); + memblock_free_all(); mem_init_print_info(NULL); /* diff --git a/arch/ia64/mm/init.c b/arch/ia64/mm/init.c index 2169ca5..43ea4a4 100644 --- a/arch/ia64/mm/init.c +++ b/arch/ia64/mm/init.c @@ -627,7 +627,7 @@ mem_init (void) set_max_mapnr(max_low_pfn); high_memory = __va(max_low_pfn * PAGE_SIZE); - free_all_bootmem(); + memblock_free_all(); mem_init_print_info(NULL); /*
[PATCH 24/30] memblock: replace free_bootmem_late with memblock_free_late
The free_bootmem_late and memblock_free_late do exactly the same thing: they iterate over a range and give pages to the page allocator. Replace calls to free_bootmem_late with calls to memblock_free_late and remove the bootmem variant. Signed-off-by: Mike Rapoport Acked-by: Michal Hocko --- arch/sparc/kernel/mdesc.c | 3 ++- arch/x86/platform/efi/quirks.c | 6 +++--- drivers/firmware/efi/apple-properties.c | 2 +- include/linux/bootmem.h | 2 -- mm/nobootmem.c | 24 5 files changed, 6 insertions(+), 31 deletions(-) diff --git a/arch/sparc/kernel/mdesc.c b/arch/sparc/kernel/mdesc.c index 59131e7..a41526b 100644 --- a/arch/sparc/kernel/mdesc.c +++ b/arch/sparc/kernel/mdesc.c @@ -12,6 +12,7 @@ #include #include #include +#include #include #include @@ -190,7 +191,7 @@ static void __init mdesc_memblock_free(struct mdesc_handle *hp) alloc_size = PAGE_ALIGN(hp->handle_size); start = __pa(hp); - free_bootmem_late(start, alloc_size); + memblock_free_late(start, alloc_size); } static struct mdesc_mem_ops memblock_mdesc_ops = { diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c index 844d31c..7b4854c 100644 --- a/arch/x86/platform/efi/quirks.c +++ b/arch/x86/platform/efi/quirks.c @@ -332,7 +332,7 @@ void __init efi_reserve_boot_services(void) /* * Because the following memblock_reserve() is paired -* with free_bootmem_late() for this region in +* with memblock_free_late() for this region in * efi_free_boot_services(), we must be extremely * careful not to reserve, and subsequently free, * critical regions of memory (like the kernel image) or @@ -363,7 +363,7 @@ void __init efi_reserve_boot_services(void) * doesn't make sense as far as the firmware is * concerned, but it does provide us with a way to tag * those regions that must not be paired with -* free_bootmem_late(). +* memblock_free_late(). */ md->attribute |= EFI_MEMORY_RUNTIME; } @@ -413,7 +413,7 @@ void __init efi_free_boot_services(void) size -= rm_size; } - free_bootmem_late(start, size); + memblock_free_late(start, size); } if (!num_entries) diff --git a/drivers/firmware/efi/apple-properties.c b/drivers/firmware/efi/apple-properties.c index 60a9571..2b675f7 100644 --- a/drivers/firmware/efi/apple-properties.c +++ b/drivers/firmware/efi/apple-properties.c @@ -235,7 +235,7 @@ static int __init map_properties(void) */ data->len = 0; memunmap(data); - free_bootmem_late(pa_data + sizeof(*data), data_len); + memblock_free_late(pa_data + sizeof(*data), data_len); return ret; } diff --git a/include/linux/bootmem.h b/include/linux/bootmem.h index 706cf8e..bcc7e2f 100644 --- a/include/linux/bootmem.h +++ b/include/linux/bootmem.h @@ -30,8 +30,6 @@ extern unsigned long free_all_bootmem(void); extern void reset_node_managed_pages(pg_data_t *pgdat); extern void reset_all_zones_managed_pages(void); -extern void free_bootmem_late(unsigned long physaddr, unsigned long size); - /* We are using top down, so it is safe to use 0 here */ #define BOOTMEM_LOW_LIMIT 0 diff --git a/mm/nobootmem.c b/mm/nobootmem.c index 85e1822..ee0f7fc 100644 --- a/mm/nobootmem.c +++ b/mm/nobootmem.c @@ -33,30 +33,6 @@ unsigned long min_low_pfn; unsigned long max_pfn; unsigned long long max_possible_pfn; -/** - * free_bootmem_late - free bootmem pages directly to page allocator - * @addr: starting address of the range - * @size: size of the range in bytes - * - * This is only useful when the bootmem allocator has already been torn - * down, but we are still initializing the system. Pages are given directly - * to the page allocator, no bootmem metadata is updated because it is gone. - */ -void __init free_bootmem_late(unsigned long addr, unsigned long size) -{ - unsigned long cursor, end; - - kmemleak_free_part_phys(addr, size); - - cursor = PFN_UP(addr); - end = PFN_DOWN(addr + size); - - for (; cursor < end; cursor++) { - __free_pages_bootmem(pfn_to_page(cursor), cursor, 0); - totalram_pages++; - } -} - static void __init __free_pages_memory(unsigned long start, unsigned long end) { int order; -- 2.7.4
[PATCH 23/30] memblock: replace free_bootmem{_node} with memblock_free
The free_bootmem and free_bootmem_node are merely wrappers for memblock_free. Replace their usage with a call to memblock_free using the following semantic patch: @@ expression e1, e2, e3; @@ ( - free_bootmem(e1, e2) + memblock_free(e1, e2) | - free_bootmem_node(e1, e2, e3) + memblock_free(e2, e3) ) Signed-off-by: Mike Rapoport Acked-by: Michal Hocko --- arch/alpha/kernel/core_irongate.c | 3 +-- arch/arm64/mm/init.c | 2 +- arch/mips/kernel/setup.c | 2 +- arch/powerpc/kernel/setup_64.c| 2 +- arch/sparc/kernel/smp_64.c| 2 +- arch/um/kernel/mem.c | 3 ++- arch/unicore32/mm/init.c | 2 +- arch/x86/kernel/setup_percpu.c| 3 ++- arch/x86/kernel/tce_64.c | 3 ++- arch/x86/xen/p2m.c| 3 ++- drivers/macintosh/smu.c | 2 +- drivers/usb/early/xhci-dbc.c | 11 ++- drivers/xen/swiotlb-xen.c | 4 +++- include/linux/bootmem.h | 4 mm/nobootmem.c| 30 -- 15 files changed, 24 insertions(+), 52 deletions(-) diff --git a/arch/alpha/kernel/core_irongate.c b/arch/alpha/kernel/core_irongate.c index f709866..35572be 100644 --- a/arch/alpha/kernel/core_irongate.c +++ b/arch/alpha/kernel/core_irongate.c @@ -234,8 +234,7 @@ albacore_init_arch(void) unsigned long size; size = initrd_end - initrd_start; - free_bootmem_node(NODE_DATA(0), __pa(initrd_start), - PAGE_ALIGN(size)); + memblock_free(__pa(initrd_start), PAGE_ALIGN(size)); if (!move_initrd(pci_mem)) printk("irongate_init_arch: initrd too big " "(%ldK)\ndisabling initrd\n", diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c index 787e279..e335452 100644 --- a/arch/arm64/mm/init.c +++ b/arch/arm64/mm/init.c @@ -538,7 +538,7 @@ static inline void free_memmap(unsigned long start_pfn, unsigned long end_pfn) * memmap array. */ if (pg < pgend) - free_bootmem(pg, pgend - pg); + memblock_free(pg, pgend - pg); } /* diff --git a/arch/mips/kernel/setup.c b/arch/mips/kernel/setup.c index a6bc2f6..86c9eda 100644 --- a/arch/mips/kernel/setup.c +++ b/arch/mips/kernel/setup.c @@ -561,7 +561,7 @@ static void __init bootmem_init(void) extern void show_kernel_relocation(const char *level); offset = __pa_symbol(_text) - __pa_symbol(VMLINUX_LOAD_ADDRESS); - free_bootmem(__pa_symbol(VMLINUX_LOAD_ADDRESS), offset); + memblock_free(__pa_symbol(VMLINUX_LOAD_ADDRESS), offset); #if defined(CONFIG_DEBUG_KERNEL) && defined(CONFIG_DEBUG_INFO) /* diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c index 6add560..e564b27 100644 --- a/arch/powerpc/kernel/setup_64.c +++ b/arch/powerpc/kernel/setup_64.c @@ -765,7 +765,7 @@ static void * __init pcpu_fc_alloc(unsigned int cpu, size_t size, size_t align) static void __init pcpu_fc_free(void *ptr, size_t size) { - free_bootmem(__pa(ptr), size); + memblock_free(__pa(ptr), size); } static int pcpu_cpu_distance(unsigned int from, unsigned int to) diff --git a/arch/sparc/kernel/smp_64.c b/arch/sparc/kernel/smp_64.c index 337febd..a087a6a 100644 --- a/arch/sparc/kernel/smp_64.c +++ b/arch/sparc/kernel/smp_64.c @@ -1607,7 +1607,7 @@ static void * __init pcpu_alloc_bootmem(unsigned int cpu, size_t size, static void __init pcpu_free_bootmem(void *ptr, size_t size) { - free_bootmem(__pa(ptr), size); + memblock_free(__pa(ptr), size); } static int __init pcpu_cpu_distance(unsigned int from, unsigned int to) diff --git a/arch/um/kernel/mem.c b/arch/um/kernel/mem.c index 185f6bb..3555c13 100644 --- a/arch/um/kernel/mem.c +++ b/arch/um/kernel/mem.c @@ -6,6 +6,7 @@ #include #include #include +#include #include #include #include @@ -46,7 +47,7 @@ void __init mem_init(void) */ brk_end = (unsigned long) UML_ROUND_UP(sbrk(0)); map_memory(brk_end, __pa(brk_end), uml_reserved - brk_end, 1, 1, 0); - free_bootmem(__pa(brk_end), uml_reserved - brk_end); + memblock_free(__pa(brk_end), uml_reserved - brk_end); uml_reserved = brk_end; /* this will put all low memory onto the freelists */ diff --git a/arch/unicore32/mm/init.c b/arch/unicore32/mm/init.c index 44ccc15..4c572ab 100644 --- a/arch/unicore32/mm/init.c +++ b/arch/unicore32/mm/init.c @@ -241,7 +241,7 @@ free_memmap(unsigned long start_pfn, unsigned long end_pfn) * free the section of the memmap array. */ if (pg < pgend) - free_bootmem(pg, pgend - pg); + memblock_free(pg, pgend - pg); } /* diff --git a/arch/x86/kernel/setup_percpu.c b/arch/x86/kernel/setup_percpu.c index
[PATCH 22/30] mm: nobootmem: remove bootmem allocation APIs
The bootmem compatibility APIs are not used and can be removed. Signed-off-by: Mike Rapoport Acked-by: Michal Hocko --- include/linux/bootmem.h | 47 -- mm/nobootmem.c | 224 2 files changed, 271 deletions(-) diff --git a/include/linux/bootmem.h b/include/linux/bootmem.h index c97c105..73f1272 100644 --- a/include/linux/bootmem.h +++ b/include/linux/bootmem.h @@ -36,33 +36,6 @@ extern void free_bootmem_node(pg_data_t *pgdat, extern void free_bootmem(unsigned long physaddr, unsigned long size); extern void free_bootmem_late(unsigned long physaddr, unsigned long size); -extern void *__alloc_bootmem(unsigned long size, -unsigned long align, -unsigned long goal); -extern void *__alloc_bootmem_nopanic(unsigned long size, -unsigned long align, -unsigned long goal) __malloc; -extern void *__alloc_bootmem_node(pg_data_t *pgdat, - unsigned long size, - unsigned long align, - unsigned long goal) __malloc; -void *__alloc_bootmem_node_high(pg_data_t *pgdat, - unsigned long size, - unsigned long align, - unsigned long goal) __malloc; -extern void *__alloc_bootmem_node_nopanic(pg_data_t *pgdat, - unsigned long size, - unsigned long align, - unsigned long goal) __malloc; -void *___alloc_bootmem_node_nopanic(pg_data_t *pgdat, - unsigned long size, - unsigned long align, - unsigned long goal, - unsigned long limit) __malloc; -extern void *__alloc_bootmem_low(unsigned long size, -unsigned long align, -unsigned long goal) __malloc; - /* We are using top down, so it is safe to use 0 here */ #define BOOTMEM_LOW_LIMIT 0 @@ -70,26 +43,6 @@ extern void *__alloc_bootmem_low(unsigned long size, #define ARCH_LOW_ADDRESS_LIMIT 0xUL #endif -#define alloc_bootmem(x) \ - __alloc_bootmem(x, SMP_CACHE_BYTES, BOOTMEM_LOW_LIMIT) -#define alloc_bootmem_align(x, align) \ - __alloc_bootmem(x, align, BOOTMEM_LOW_LIMIT) -#define alloc_bootmem_pages(x) \ - __alloc_bootmem(x, PAGE_SIZE, BOOTMEM_LOW_LIMIT) -#define alloc_bootmem_pages_nopanic(x) \ - __alloc_bootmem_nopanic(x, PAGE_SIZE, BOOTMEM_LOW_LIMIT) -#define alloc_bootmem_node(pgdat, x) \ - __alloc_bootmem_node(pgdat, x, SMP_CACHE_BYTES, BOOTMEM_LOW_LIMIT) -#define alloc_bootmem_node_nopanic(pgdat, x) \ - __alloc_bootmem_node_nopanic(pgdat, x, SMP_CACHE_BYTES, BOOTMEM_LOW_LIMIT) -#define alloc_bootmem_pages_node(pgdat, x) \ - __alloc_bootmem_node(pgdat, x, PAGE_SIZE, BOOTMEM_LOW_LIMIT) - -#define alloc_bootmem_low(x) \ - __alloc_bootmem_low(x, SMP_CACHE_BYTES, 0) -#define alloc_bootmem_low_pages(x) \ - __alloc_bootmem_low(x, PAGE_SIZE, 0) - /* FIXME: use MEMBLOCK_ALLOC_* variants here */ #define BOOTMEM_ALLOC_ACCESSIBLE 0 #define BOOTMEM_ALLOC_ANYWHERE (~(phys_addr_t)0) diff --git a/mm/nobootmem.c b/mm/nobootmem.c index 44ce7de..bc38e56 100644 --- a/mm/nobootmem.c +++ b/mm/nobootmem.c @@ -33,41 +33,6 @@ unsigned long min_low_pfn; unsigned long max_pfn; unsigned long long max_possible_pfn; -static void * __init __alloc_memory_core_early(int nid, u64 size, u64 align, - u64 goal, u64 limit) -{ - void *ptr; - u64 addr; - enum memblock_flags flags = choose_memblock_flags(); - - if (limit > memblock.current_limit) - limit = memblock.current_limit; - -again: - addr = memblock_find_in_range_node(size, align, goal, limit, nid, - flags); - if (!addr && (flags & MEMBLOCK_MIRROR)) { - flags &= ~MEMBLOCK_MIRROR; - pr_warn("Could not allocate %pap bytes of mirrored memory\n", - ); - goto again; - } - if (!addr) - return NULL; - - if (memblock_reserve(addr, size)) - return NULL; - - ptr = phys_to_virt(addr); - memset(ptr, 0, size); - /* -* The min_count is set to 0 so that bootmem allocated blocks -* are never reported as leaks. -*/ - kmemleak_alloc(ptr, size, 0, 0); - return ptr; -} - /** * free_bootmem_late - free bootmem pages directly to page allocator * @addr: starting address of the range @@ -215,192 +180,3 @@ void __init free_bootmem(unsigned long addr, unsigned long size) { memblock_free(addr, size); } - -static void * __init
[PATCH 21/30] memblock: replace alloc_bootmem with memblock_alloc
The alloc_bootmem(size) is a shortcut for allocation of SMP_CACHE_BYTES aligned memory. When the align parameter of memblock_alloc() is 0, the alignment is implicitly set to SMP_CACHE_BYTES and thus alloc_bootmem(size) and memblock_alloc(size, 0) are equivalent. The conversion is done using the following semantic patch: @@ expression size; @@ - alloc_bootmem(size) + memblock_alloc(size, 0) Signed-off-by: Mike Rapoport --- arch/alpha/kernel/core_marvel.c | 4 ++-- arch/alpha/kernel/pci-noop.c| 4 ++-- arch/alpha/kernel/pci.c | 4 ++-- arch/alpha/kernel/pci_iommu.c | 4 ++-- arch/ia64/kernel/mca.c | 4 ++-- arch/ia64/mm/tlb.c | 4 ++-- arch/m68k/sun3/sun3dvma.c | 3 ++- arch/microblaze/mm/init.c | 2 +- arch/mips/kernel/setup.c| 2 +- arch/um/drivers/net_kern.c | 2 +- arch/um/drivers/vector_kern.c | 2 +- arch/um/kernel/initrd.c | 2 +- arch/x86/kernel/acpi/boot.c | 3 ++- arch/x86/kernel/apic/io_apic.c | 2 +- arch/x86/kernel/e820.c | 2 +- arch/x86/platform/olpc/olpc_dt.c| 2 +- arch/xtensa/platforms/iss/network.c | 2 +- drivers/macintosh/smu.c | 2 +- init/main.c | 4 ++-- 19 files changed, 28 insertions(+), 26 deletions(-) diff --git a/arch/alpha/kernel/core_marvel.c b/arch/alpha/kernel/core_marvel.c index bdebb8c2..1f00c94 100644 --- a/arch/alpha/kernel/core_marvel.c +++ b/arch/alpha/kernel/core_marvel.c @@ -82,7 +82,7 @@ mk_resource_name(int pe, int port, char *str) char *name; sprintf(tmp, "PCI %s PE %d PORT %d", str, pe, port); - name = alloc_bootmem(strlen(tmp) + 1); + name = memblock_alloc(strlen(tmp) + 1, 0); strcpy(name, tmp); return name; @@ -117,7 +117,7 @@ alloc_io7(unsigned int pe) return NULL; } - io7 = alloc_bootmem(sizeof(*io7)); + io7 = memblock_alloc(sizeof(*io7), 0); io7->pe = pe; raw_spin_lock_init(>irq_lock); diff --git a/arch/alpha/kernel/pci-noop.c b/arch/alpha/kernel/pci-noop.c index c7c5879..59cbfc2 100644 --- a/arch/alpha/kernel/pci-noop.c +++ b/arch/alpha/kernel/pci-noop.c @@ -33,7 +33,7 @@ alloc_pci_controller(void) { struct pci_controller *hose; - hose = alloc_bootmem(sizeof(*hose)); + hose = memblock_alloc(sizeof(*hose), 0); *hose_tail = hose; hose_tail = >next; @@ -44,7 +44,7 @@ alloc_pci_controller(void) struct resource * __init alloc_resource(void) { - return alloc_bootmem(sizeof(struct resource)); + return memblock_alloc(sizeof(struct resource), 0); } SYSCALL_DEFINE3(pciconfig_iobase, long, which, unsigned long, bus, diff --git a/arch/alpha/kernel/pci.c b/arch/alpha/kernel/pci.c index c668c3b..4cc3eb9 100644 --- a/arch/alpha/kernel/pci.c +++ b/arch/alpha/kernel/pci.c @@ -392,7 +392,7 @@ alloc_pci_controller(void) { struct pci_controller *hose; - hose = alloc_bootmem(sizeof(*hose)); + hose = memblock_alloc(sizeof(*hose), 0); *hose_tail = hose; hose_tail = >next; @@ -403,7 +403,7 @@ alloc_pci_controller(void) struct resource * __init alloc_resource(void) { - return alloc_bootmem(sizeof(struct resource)); + return memblock_alloc(sizeof(struct resource), 0); } diff --git a/arch/alpha/kernel/pci_iommu.c b/arch/alpha/kernel/pci_iommu.c index 0c05493..5d178c7 100644 --- a/arch/alpha/kernel/pci_iommu.c +++ b/arch/alpha/kernel/pci_iommu.c @@ -79,7 +79,7 @@ iommu_arena_new_node(int nid, struct pci_controller *hose, dma_addr_t base, printk("%s: couldn't allocate arena from node %d\n" "falling back to system-wide allocation\n", __func__, nid); - arena = alloc_bootmem(sizeof(*arena)); + arena = memblock_alloc(sizeof(*arena), 0); } arena->ptes = memblock_alloc_node(sizeof(*arena), align, nid); @@ -92,7 +92,7 @@ iommu_arena_new_node(int nid, struct pci_controller *hose, dma_addr_t base, #else /* CONFIG_DISCONTIGMEM */ - arena = alloc_bootmem(sizeof(*arena)); + arena = memblock_alloc(sizeof(*arena), 0); arena->ptes = memblock_alloc_from(mem_size, align, 0); #endif /* CONFIG_DISCONTIGMEM */ diff --git a/arch/ia64/kernel/mca.c b/arch/ia64/kernel/mca.c index 5586926..7120976 100644 --- a/arch/ia64/kernel/mca.c +++ b/arch/ia64/kernel/mca.c @@ -361,9 +361,9 @@ static ia64_state_log_t ia64_state_log[IA64_MAX_LOG_TYPES]; #define IA64_LOG_ALLOCATE(it, size) \ {ia64_state_log[it].isl_log[IA64_LOG_CURR_INDEX(it)] = \ - (ia64_err_rec_t *)alloc_bootmem(size); \ + (ia64_err_rec_t *)memblock_alloc(size, 0); \ ia64_state_log[it].isl_log[IA64_LOG_NEXT_INDEX(it)] = \ - (ia64_err_rec_t *)alloc_bootmem(size);} + (ia64_err_rec_t *)memblock_alloc(size, 0);} #define
[PATCH 20/30] memblock: replace __alloc_bootmem with memblock_alloc_from
The functions are equivalent, just the later does not require nobootmem translation layer. The conversion is done using the following semantic patch: @@ expression size, align, goal; @@ - __alloc_bootmem(size, align, goal) + memblock_alloc_from(size, align, goal) Signed-off-by: Mike Rapoport Acked-by: Michal Hocko --- arch/alpha/kernel/core_cia.c | 2 +- arch/alpha/kernel/pci_iommu.c | 4 ++-- arch/alpha/kernel/setup.c | 2 +- arch/ia64/kernel/mca.c| 4 ++-- arch/ia64/mm/contig.c | 5 +++-- arch/mips/kernel/traps.c | 2 +- arch/sparc/kernel/prom_32.c | 2 +- arch/sparc/kernel/smp_64.c| 10 +- arch/sparc/mm/init_32.c | 2 +- arch/sparc/mm/init_64.c | 9 ++--- arch/sparc/mm/srmmu.c | 10 +- include/linux/bootmem.h | 8 12 files changed, 36 insertions(+), 24 deletions(-) diff --git a/arch/alpha/kernel/core_cia.c b/arch/alpha/kernel/core_cia.c index 4b38386..026ee95 100644 --- a/arch/alpha/kernel/core_cia.c +++ b/arch/alpha/kernel/core_cia.c @@ -331,7 +331,7 @@ cia_prepare_tbia_workaround(int window) long i; /* Use minimal 1K map. */ - ppte = __alloc_bootmem(CIA_BROKEN_TBIA_SIZE, 32768, 0); + ppte = memblock_alloc_from(CIA_BROKEN_TBIA_SIZE, 32768, 0); pte = (virt_to_phys(ppte) >> (PAGE_SHIFT - 1)) | 1; for (i = 0; i < CIA_BROKEN_TBIA_SIZE / sizeof(unsigned long); ++i) diff --git a/arch/alpha/kernel/pci_iommu.c b/arch/alpha/kernel/pci_iommu.c index b52d76f..0c05493 100644 --- a/arch/alpha/kernel/pci_iommu.c +++ b/arch/alpha/kernel/pci_iommu.c @@ -87,13 +87,13 @@ iommu_arena_new_node(int nid, struct pci_controller *hose, dma_addr_t base, printk("%s: couldn't allocate arena ptes from node %d\n" "falling back to system-wide allocation\n", __func__, nid); - arena->ptes = __alloc_bootmem(mem_size, align, 0); + arena->ptes = memblock_alloc_from(mem_size, align, 0); } #else /* CONFIG_DISCONTIGMEM */ arena = alloc_bootmem(sizeof(*arena)); - arena->ptes = __alloc_bootmem(mem_size, align, 0); + arena->ptes = memblock_alloc_from(mem_size, align, 0); #endif /* CONFIG_DISCONTIGMEM */ diff --git a/arch/alpha/kernel/setup.c b/arch/alpha/kernel/setup.c index 4f0d944..64c06a0 100644 --- a/arch/alpha/kernel/setup.c +++ b/arch/alpha/kernel/setup.c @@ -294,7 +294,7 @@ move_initrd(unsigned long mem_limit) unsigned long size; size = initrd_end - initrd_start; - start = __alloc_bootmem(PAGE_ALIGN(size), PAGE_SIZE, 0); + start = memblock_alloc_from(PAGE_ALIGN(size), PAGE_SIZE, 0); if (!start || __pa(start) + size > mem_limit) { initrd_start = initrd_end = 0; return NULL; diff --git a/arch/ia64/kernel/mca.c b/arch/ia64/kernel/mca.c index 6115464..5586926 100644 --- a/arch/ia64/kernel/mca.c +++ b/arch/ia64/kernel/mca.c @@ -1835,8 +1835,8 @@ format_mca_init_stack(void *mca_data, unsigned long offset, /* Caller prevents this from being called after init */ static void * __ref mca_bootmem(void) { - return __alloc_bootmem(sizeof(struct ia64_mca_cpu), - KERNEL_STACK_SIZE, 0); + return memblock_alloc_from(sizeof(struct ia64_mca_cpu), + KERNEL_STACK_SIZE, 0); } /* Do per-CPU MCA-related initialization. */ diff --git a/arch/ia64/mm/contig.c b/arch/ia64/mm/contig.c index e2e40bb..9e5c23a 100644 --- a/arch/ia64/mm/contig.c +++ b/arch/ia64/mm/contig.c @@ -85,8 +85,9 @@ void *per_cpu_init(void) static inline void alloc_per_cpu_data(void) { - cpu_data = __alloc_bootmem(PERCPU_PAGE_SIZE * num_possible_cpus(), - PERCPU_PAGE_SIZE, __pa(MAX_DMA_ADDRESS)); + cpu_data = memblock_alloc_from(PERCPU_PAGE_SIZE * num_possible_cpus(), + PERCPU_PAGE_SIZE, + __pa(MAX_DMA_ADDRESS)); } /** diff --git a/arch/mips/kernel/traps.c b/arch/mips/kernel/traps.c index 5feef28..623dc18 100644 --- a/arch/mips/kernel/traps.c +++ b/arch/mips/kernel/traps.c @@ -2263,7 +2263,7 @@ void __init trap_init(void) memblock_set_bottom_up(true); ebase = (unsigned long) - __alloc_bootmem(size, 1 << fls(size), 0); + memblock_alloc_from(size, 1 << fls(size), 0); memblock_set_bottom_up(false); /* diff --git a/arch/sparc/kernel/prom_32.c b/arch/sparc/kernel/prom_32.c index b51cbb9..4389944 100644 --- a/arch/sparc/kernel/prom_32.c +++ b/arch/sparc/kernel/prom_32.c @@ -32,7 +32,7 @@ void * __init prom_early_alloc(unsigned long size) { void *ret; - ret = __alloc_bootmem(size, SMP_CACHE_BYTES, 0UL); + ret = memblock_alloc_from(size, SMP_CACHE_BYTES, 0UL); if (ret != NULL) memset(ret, 0,
[PATCH 19/30] memblock: replace alloc_bootmem_pages with memblock_alloc
The alloc_bootmem_pages() function allocates PAGE_SIZE aligned memory. memblock_alloc() with alignment set to PAGE_SIZE does exactly the same thing. The conversion is done using the following semantic patch: @@ expression e; @@ - alloc_bootmem_pages(e) + memblock_alloc(e, PAGE_SIZE) Signed-off-by: Mike Rapoport Acked-by: Michal Hocko --- arch/c6x/mm/init.c | 3 ++- arch/h8300/mm/init.c | 2 +- arch/m68k/mm/init.c| 2 +- arch/m68k/mm/mcfmmu.c | 4 ++-- arch/m68k/mm/motorola.c| 2 +- arch/m68k/mm/sun3mmu.c | 4 ++-- arch/sh/mm/init.c | 4 ++-- arch/x86/kernel/apic/io_apic.c | 3 ++- arch/x86/mm/init_64.c | 2 +- drivers/xen/swiotlb-xen.c | 3 ++- 10 files changed, 16 insertions(+), 13 deletions(-) diff --git a/arch/c6x/mm/init.c b/arch/c6x/mm/init.c index 4cc72b0..dc369ad 100644 --- a/arch/c6x/mm/init.c +++ b/arch/c6x/mm/init.c @@ -38,7 +38,8 @@ void __init paging_init(void) struct pglist_data *pgdat = NODE_DATA(0); unsigned long zones_size[MAX_NR_ZONES] = {0, }; - empty_zero_page = (unsigned long) alloc_bootmem_pages(PAGE_SIZE); + empty_zero_page = (unsigned long) memblock_alloc(PAGE_SIZE, + PAGE_SIZE); memset((void *)empty_zero_page, 0, PAGE_SIZE); /* diff --git a/arch/h8300/mm/init.c b/arch/h8300/mm/init.c index 015287a..5d31ac9 100644 --- a/arch/h8300/mm/init.c +++ b/arch/h8300/mm/init.c @@ -67,7 +67,7 @@ void __init paging_init(void) * Initialize the bad page table and bad page to point * to a couple of allocated pages. */ - empty_zero_page = (unsigned long)alloc_bootmem_pages(PAGE_SIZE); + empty_zero_page = (unsigned long)memblock_alloc(PAGE_SIZE, PAGE_SIZE); memset((void *)empty_zero_page, 0, PAGE_SIZE); /* diff --git a/arch/m68k/mm/init.c b/arch/m68k/mm/init.c index 38e2b27..977363e 100644 --- a/arch/m68k/mm/init.c +++ b/arch/m68k/mm/init.c @@ -93,7 +93,7 @@ void __init paging_init(void) high_memory = (void *) end_mem; - empty_zero_page = alloc_bootmem_pages(PAGE_SIZE); + empty_zero_page = memblock_alloc(PAGE_SIZE, PAGE_SIZE); /* * Set up SFC/DFC registers (user data space). diff --git a/arch/m68k/mm/mcfmmu.c b/arch/m68k/mm/mcfmmu.c index f5453d9..38a1d92 100644 --- a/arch/m68k/mm/mcfmmu.c +++ b/arch/m68k/mm/mcfmmu.c @@ -44,7 +44,7 @@ void __init paging_init(void) enum zone_type zone; int i; - empty_zero_page = (void *) alloc_bootmem_pages(PAGE_SIZE); + empty_zero_page = (void *) memblock_alloc(PAGE_SIZE, PAGE_SIZE); memset((void *) empty_zero_page, 0, PAGE_SIZE); pg_dir = swapper_pg_dir; @@ -52,7 +52,7 @@ void __init paging_init(void) size = num_pages * sizeof(pte_t); size = (size + PAGE_SIZE) & ~(PAGE_SIZE-1); - next_pgtable = (unsigned long) alloc_bootmem_pages(size); + next_pgtable = (unsigned long) memblock_alloc(size, PAGE_SIZE); bootmem_end = (next_pgtable + size + PAGE_SIZE) & PAGE_MASK; pg_dir += PAGE_OFFSET >> PGDIR_SHIFT; diff --git a/arch/m68k/mm/motorola.c b/arch/m68k/mm/motorola.c index 8bcf57e..2113eec 100644 --- a/arch/m68k/mm/motorola.c +++ b/arch/m68k/mm/motorola.c @@ -276,7 +276,7 @@ void __init paging_init(void) * initialize the bad page table and bad page to point * to a couple of allocated pages */ - empty_zero_page = alloc_bootmem_pages(PAGE_SIZE); + empty_zero_page = memblock_alloc(PAGE_SIZE, PAGE_SIZE); /* * Set up SFC/DFC registers diff --git a/arch/m68k/mm/sun3mmu.c b/arch/m68k/mm/sun3mmu.c index 4a99799..19c05ab 100644 --- a/arch/m68k/mm/sun3mmu.c +++ b/arch/m68k/mm/sun3mmu.c @@ -45,7 +45,7 @@ void __init paging_init(void) unsigned long zones_size[MAX_NR_ZONES] = { 0, }; unsigned long size; - empty_zero_page = alloc_bootmem_pages(PAGE_SIZE); + empty_zero_page = memblock_alloc(PAGE_SIZE, PAGE_SIZE); address = PAGE_OFFSET; pg_dir = swapper_pg_dir; @@ -55,7 +55,7 @@ void __init paging_init(void) size = num_pages * sizeof(pte_t); size = (size + PAGE_SIZE) & ~(PAGE_SIZE-1); - next_pgtable = (unsigned long)alloc_bootmem_pages(size); + next_pgtable = (unsigned long)memblock_alloc(size, PAGE_SIZE); bootmem_end = (next_pgtable + size + PAGE_SIZE) & PAGE_MASK; /* Map whole memory from PAGE_OFFSET (0x0E00) */ diff --git a/arch/sh/mm/init.c b/arch/sh/mm/init.c index 7713c08..c884b76 100644 --- a/arch/sh/mm/init.c +++ b/arch/sh/mm/init.c @@ -128,7 +128,7 @@ static pmd_t * __init one_md_table_init(pud_t *pud) if (pud_none(*pud)) { pmd_t *pmd; - pmd = alloc_bootmem_pages(PAGE_SIZE); + pmd = memblock_alloc(PAGE_SIZE, PAGE_SIZE); pud_populate(_mm, pud, pmd);
[PATCH 18/30] memblock: replace alloc_bootmem_low_pages with memblock_alloc_low
The alloc_bootmem_low_pages() function allocates PAGE_SIZE aligned regions from low memory. memblock_alloc_low() with alignment set to PAGE_SIZE does exactly the same thing. The conversion is done using the following semantic patch: @@ expression e; @@ - alloc_bootmem_low_pages(e) + memblock_alloc_low(e, PAGE_SIZE) Signed-off-by: Mike Rapoport Acked-by: Michal Hocko --- arch/arc/mm/highmem.c| 2 +- arch/m68k/atari/stram.c | 3 ++- arch/m68k/mm/motorola.c | 5 +++-- arch/mips/cavium-octeon/dma-octeon.c | 2 +- arch/mips/mm/init.c | 3 ++- arch/um/kernel/mem.c | 10 ++ arch/xtensa/mm/mmu.c | 2 +- 7 files changed, 16 insertions(+), 11 deletions(-) diff --git a/arch/arc/mm/highmem.c b/arch/arc/mm/highmem.c index 77ff64a..f582dc8 100644 --- a/arch/arc/mm/highmem.c +++ b/arch/arc/mm/highmem.c @@ -123,7 +123,7 @@ static noinline pte_t * __init alloc_kmap_pgtable(unsigned long kvaddr) pud_k = pud_offset(pgd_k, kvaddr); pmd_k = pmd_offset(pud_k, kvaddr); - pte_k = (pte_t *)alloc_bootmem_low_pages(PAGE_SIZE); + pte_k = (pte_t *)memblock_alloc_low(PAGE_SIZE, PAGE_SIZE); pmd_populate_kernel(_mm, pmd_k, pte_k); return pte_k; } diff --git a/arch/m68k/atari/stram.c b/arch/m68k/atari/stram.c index c83d664..1089d67 100644 --- a/arch/m68k/atari/stram.c +++ b/arch/m68k/atari/stram.c @@ -95,7 +95,8 @@ void __init atari_stram_reserve_pages(void *start_mem) { if (kernel_in_stram) { pr_debug("atari_stram pool: kernel in ST-RAM, using alloc_bootmem!\n"); - stram_pool.start = (resource_size_t)alloc_bootmem_low_pages(pool_size); + stram_pool.start = (resource_size_t)memblock_alloc_low(pool_size, + PAGE_SIZE); stram_pool.end = stram_pool.start + pool_size - 1; request_resource(_resource, _pool); stram_virt_offset = 0; diff --git a/arch/m68k/mm/motorola.c b/arch/m68k/mm/motorola.c index 4e17ecb..8bcf57e 100644 --- a/arch/m68k/mm/motorola.c +++ b/arch/m68k/mm/motorola.c @@ -55,7 +55,7 @@ static pte_t * __init kernel_page_table(void) { pte_t *ptablep; - ptablep = (pte_t *)alloc_bootmem_low_pages(PAGE_SIZE); + ptablep = (pte_t *)memblock_alloc_low(PAGE_SIZE, PAGE_SIZE); clear_page(ptablep); __flush_page_to_ram(ptablep); @@ -95,7 +95,8 @@ static pmd_t * __init kernel_ptr_table(void) last_pgtable += PTRS_PER_PMD; if (((unsigned long)last_pgtable & ~PAGE_MASK) == 0) { - last_pgtable = (pmd_t *)alloc_bootmem_low_pages(PAGE_SIZE); + last_pgtable = (pmd_t *)memblock_alloc_low(PAGE_SIZE, + PAGE_SIZE); clear_page(last_pgtable); __flush_page_to_ram(last_pgtable); diff --git a/arch/mips/cavium-octeon/dma-octeon.c b/arch/mips/cavium-octeon/dma-octeon.c index 236833b..c44c1a6 100644 --- a/arch/mips/cavium-octeon/dma-octeon.c +++ b/arch/mips/cavium-octeon/dma-octeon.c @@ -244,7 +244,7 @@ void __init plat_swiotlb_setup(void) swiotlb_nslabs = ALIGN(swiotlb_nslabs, IO_TLB_SEGSIZE); swiotlbsize = swiotlb_nslabs << IO_TLB_SHIFT; - octeon_swiotlb = alloc_bootmem_low_pages(swiotlbsize); + octeon_swiotlb = memblock_alloc_low(swiotlbsize, PAGE_SIZE); if (swiotlb_init_with_tbl(octeon_swiotlb, swiotlb_nslabs, 1) == -ENOMEM) panic("Cannot allocate SWIOTLB buffer"); diff --git a/arch/mips/mm/init.c b/arch/mips/mm/init.c index 400676c..a010fba7 100644 --- a/arch/mips/mm/init.c +++ b/arch/mips/mm/init.c @@ -244,7 +244,8 @@ void __init fixrange_init(unsigned long start, unsigned long end, pmd = (pmd_t *)pud; for (; (k < PTRS_PER_PMD) && (vaddr < end); pmd++, k++) { if (pmd_none(*pmd)) { - pte = (pte_t *) alloc_bootmem_low_pages(PAGE_SIZE); + pte = (pte_t *) memblock_alloc_low(PAGE_SIZE, + PAGE_SIZE); set_pmd(pmd, __pmd((unsigned long)pte)); BUG_ON(pte != pte_offset_kernel(pmd, 0)); } diff --git a/arch/um/kernel/mem.c b/arch/um/kernel/mem.c index 3c0e470..185f6bb 100644 --- a/arch/um/kernel/mem.c +++ b/arch/um/kernel/mem.c @@ -64,7 +64,8 @@ void __init mem_init(void) static void __init one_page_table_init(pmd_t *pmd) { if (pmd_none(*pmd)) { - pte_t *pte = (pte_t *) alloc_bootmem_low_pages(PAGE_SIZE); + pte_t *pte = (pte_t *) memblock_alloc_low(PAGE_SIZE, + PAGE_SIZE);
[PATCH 17/30] memblock: replace alloc_bootmem_node with memblock_alloc_node
Both functions attempt to allocate memory with specified alignment from a particular node. If the allocation from that node fails, they both fall back to allocating from any node in the system. Usage of native memblock API eliminates the nobootmem translation layer. Signed-off-by: Mike Rapoport Acked-by: Michal Hocko --- arch/alpha/kernel/pci_iommu.c | 4 ++-- arch/ia64/sn/kernel/io_common.c | 7 ++- arch/ia64/sn/kernel/setup.c | 4 ++-- 3 files changed, 6 insertions(+), 9 deletions(-) diff --git a/arch/alpha/kernel/pci_iommu.c b/arch/alpha/kernel/pci_iommu.c index 6923b0d..b52d76f 100644 --- a/arch/alpha/kernel/pci_iommu.c +++ b/arch/alpha/kernel/pci_iommu.c @@ -74,7 +74,7 @@ iommu_arena_new_node(int nid, struct pci_controller *hose, dma_addr_t base, #ifdef CONFIG_DISCONTIGMEM - arena = alloc_bootmem_node(NODE_DATA(nid), sizeof(*arena)); + arena = memblock_alloc_node(sizeof(*arena), align, nid); if (!NODE_DATA(nid) || !arena) { printk("%s: couldn't allocate arena from node %d\n" "falling back to system-wide allocation\n", @@ -82,7 +82,7 @@ iommu_arena_new_node(int nid, struct pci_controller *hose, dma_addr_t base, arena = alloc_bootmem(sizeof(*arena)); } - arena->ptes = __alloc_bootmem_node(NODE_DATA(nid), mem_size, align, 0); + arena->ptes = memblock_alloc_node(sizeof(*arena), align, nid); if (!NODE_DATA(nid) || !arena->ptes) { printk("%s: couldn't allocate arena ptes from node %d\n" "falling back to system-wide allocation\n", diff --git a/arch/ia64/sn/kernel/io_common.c b/arch/ia64/sn/kernel/io_common.c index 102aaba..8b05d55 100644 --- a/arch/ia64/sn/kernel/io_common.c +++ b/arch/ia64/sn/kernel/io_common.c @@ -385,16 +385,13 @@ void __init hubdev_init_node(nodepda_t * npda, cnodeid_t node) { struct hubdev_info *hubdev_info; int size; - pg_data_t *pg; size = sizeof(struct hubdev_info); if (node >= num_online_nodes()) /* Headless/memless IO nodes */ - pg = NODE_DATA(0); - else - pg = NODE_DATA(node); + node = 0; - hubdev_info = (struct hubdev_info *)alloc_bootmem_node(pg, size); + hubdev_info = (struct hubdev_info *)memblock_alloc_node(size, 0, node); npda->pdinfo = (void *)hubdev_info; } diff --git a/arch/ia64/sn/kernel/setup.c b/arch/ia64/sn/kernel/setup.c index 5f6b6b4..ab2564f 100644 --- a/arch/ia64/sn/kernel/setup.c +++ b/arch/ia64/sn/kernel/setup.c @@ -511,7 +511,7 @@ static void __init sn_init_pdas(char **cmdline_p) */ for_each_online_node(cnode) { nodepdaindr[cnode] = - alloc_bootmem_node(NODE_DATA(cnode), sizeof(nodepda_t)); + memblock_alloc_node(sizeof(nodepda_t), 0, cnode); memset(nodepdaindr[cnode]->phys_cpuid, -1, sizeof(nodepdaindr[cnode]->phys_cpuid)); spin_lock_init([cnode]->ptc_lock); @@ -522,7 +522,7 @@ static void __init sn_init_pdas(char **cmdline_p) */ for (cnode = num_online_nodes(); cnode < num_cnodes; cnode++) nodepdaindr[cnode] = - alloc_bootmem_node(NODE_DATA(0), sizeof(nodepda_t)); + memblock_alloc_node(sizeof(nodepda_t), 0, 0); /* * Now copy the array of nodepda pointers to each nodepda. -- 2.7.4
[PATCH 16/30] memblock: replace __alloc_bootmem_node with appropriate memblock_ API
Use memblock_alloc_try_nid whenever goal (i.e. minimal address is specified) and memblock_alloc_node otherwise. Signed-off-by: Mike Rapoport --- arch/ia64/mm/discontig.c | 6 -- arch/powerpc/kernel/setup_64.c | 6 -- arch/sparc/kernel/setup_64.c | 10 -- arch/sparc/kernel/smp_64.c | 4 ++-- 4 files changed, 14 insertions(+), 12 deletions(-) diff --git a/arch/ia64/mm/discontig.c b/arch/ia64/mm/discontig.c index 1928d57..918dda9 100644 --- a/arch/ia64/mm/discontig.c +++ b/arch/ia64/mm/discontig.c @@ -451,8 +451,10 @@ static void __init *memory_less_node_alloc(int nid, unsigned long pernodesize) if (bestnode == -1) bestnode = anynode; - ptr = __alloc_bootmem_node(pgdat_list[bestnode], pernodesize, - PERCPU_PAGE_SIZE, __pa(MAX_DMA_ADDRESS)); + ptr = memblock_alloc_try_nid(pernodesize, PERCPU_PAGE_SIZE, +__pa(MAX_DMA_ADDRESS), +BOOTMEM_ALLOC_ACCESSIBLE, +bestnode); return ptr; } diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c index 6a501b2..6add560 100644 --- a/arch/powerpc/kernel/setup_64.c +++ b/arch/powerpc/kernel/setup_64.c @@ -757,8 +757,10 @@ void __init emergency_stack_init(void) static void * __init pcpu_fc_alloc(unsigned int cpu, size_t size, size_t align) { - return __alloc_bootmem_node(NODE_DATA(early_cpu_to_node(cpu)), size, align, - __pa(MAX_DMA_ADDRESS)); + return memblock_alloc_try_nid(size, align, __pa(MAX_DMA_ADDRESS), + BOOTMEM_ALLOC_ACCESSIBLE, + early_cpu_to_node(cpu)); + } static void __init pcpu_fc_free(void *ptr, size_t size) diff --git a/arch/sparc/kernel/setup_64.c b/arch/sparc/kernel/setup_64.c index 206bf81..5fb11ea 100644 --- a/arch/sparc/kernel/setup_64.c +++ b/arch/sparc/kernel/setup_64.c @@ -622,12 +622,10 @@ void __init alloc_irqstack_bootmem(void) for_each_possible_cpu(i) { node = cpu_to_node(i); - softirq_stack[i] = __alloc_bootmem_node(NODE_DATA(node), - THREAD_SIZE, - THREAD_SIZE, 0); - hardirq_stack[i] = __alloc_bootmem_node(NODE_DATA(node), - THREAD_SIZE, - THREAD_SIZE, 0); + softirq_stack[i] = memblock_alloc_node(THREAD_SIZE, + THREAD_SIZE, node); + hardirq_stack[i] = memblock_alloc_node(THREAD_SIZE, + THREAD_SIZE, node); } } diff --git a/arch/sparc/kernel/smp_64.c b/arch/sparc/kernel/smp_64.c index d3ea1f3..83ff88d 100644 --- a/arch/sparc/kernel/smp_64.c +++ b/arch/sparc/kernel/smp_64.c @@ -1594,8 +1594,8 @@ static void * __init pcpu_alloc_bootmem(unsigned int cpu, size_t size, pr_debug("per cpu data for cpu%d %lu bytes at %016lx\n", cpu, size, __pa(ptr)); } else { - ptr = __alloc_bootmem_node(NODE_DATA(node), - size, align, goal); + ptr = memblock_alloc_try_nid(size, align, goal, +BOOTMEM_ALLOC_ACCESSIBLE, node); pr_debug("per cpu data for cpu%d %lu bytes on node%d at " "%016lx\n", cpu, size, node, __pa(ptr)); } -- 2.7.4
[PATCH 15/30] memblock: replace alloc_bootmem_pages_node with memblock_alloc_node
The functions are equivalent, just the later does not require nobootmem translation layer. Signed-off-by: Mike Rapoport Acked-by: Michal Hocko --- arch/ia64/mm/init.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/arch/ia64/mm/init.c b/arch/ia64/mm/init.c index 3b85c3e..2169ca5 100644 --- a/arch/ia64/mm/init.c +++ b/arch/ia64/mm/init.c @@ -447,19 +447,19 @@ int __init create_mem_map_page_table(u64 start, u64 end, void *arg) for (address = start_page; address < end_page; address += PAGE_SIZE) { pgd = pgd_offset_k(address); if (pgd_none(*pgd)) - pgd_populate(_mm, pgd, alloc_bootmem_pages_node(NODE_DATA(node), PAGE_SIZE)); + pgd_populate(_mm, pgd, memblock_alloc_node(PAGE_SIZE, PAGE_SIZE, node)); pud = pud_offset(pgd, address); if (pud_none(*pud)) - pud_populate(_mm, pud, alloc_bootmem_pages_node(NODE_DATA(node), PAGE_SIZE)); + pud_populate(_mm, pud, memblock_alloc_node(PAGE_SIZE, PAGE_SIZE, node)); pmd = pmd_offset(pud, address); if (pmd_none(*pmd)) - pmd_populate_kernel(_mm, pmd, alloc_bootmem_pages_node(NODE_DATA(node), PAGE_SIZE)); + pmd_populate_kernel(_mm, pmd, memblock_alloc_node(PAGE_SIZE, PAGE_SIZE, node)); pte = pte_offset_kernel(pmd, address); if (pte_none(*pte)) - set_pte(pte, pfn_pte(__pa(alloc_bootmem_pages_node(NODE_DATA(node), PAGE_SIZE)) >> PAGE_SHIFT, + set_pte(pte, pfn_pte(__pa(memblock_alloc_node(PAGE_SIZE, PAGE_SIZE, node)) >> PAGE_SHIFT, PAGE_KERNEL)); } return 0; -- 2.7.4
[PATCH 14/30] memblock: add align parameter to memblock_alloc_node()
With the align parameter memblock_alloc_node() can be used as drop in replacement for alloc_bootmem_pages_node() and __alloc_bootmem_node(), which is done in the following patches. Signed-off-by: Mike Rapoport --- include/linux/bootmem.h | 4 ++-- mm/sparse.c | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/include/linux/bootmem.h b/include/linux/bootmem.h index 7d91f0f..3896af2 100644 --- a/include/linux/bootmem.h +++ b/include/linux/bootmem.h @@ -157,9 +157,9 @@ static inline void * __init memblock_alloc_from_nopanic( } static inline void * __init memblock_alloc_node( - phys_addr_t size, int nid) + phys_addr_t size, phys_addr_t align, int nid) { - return memblock_alloc_try_nid(size, 0, BOOTMEM_LOW_LIMIT, + return memblock_alloc_try_nid(size, align, BOOTMEM_LOW_LIMIT, BOOTMEM_ALLOC_ACCESSIBLE, nid); } diff --git a/mm/sparse.c b/mm/sparse.c index 04e97af..509828f 100644 --- a/mm/sparse.c +++ b/mm/sparse.c @@ -68,7 +68,7 @@ static noinline struct mem_section __ref *sparse_index_alloc(int nid) if (slab_is_available()) section = kzalloc_node(array_size, GFP_KERNEL, nid); else - section = memblock_alloc_node(array_size, nid); + section = memblock_alloc_node(array_size, 0, nid); return section; } -- 2.7.4
[PATCH 13/30] memblock: replace __alloc_bootmem_nopanic with memblock_alloc_from_nopanic
When __alloc_bootmem_nopanic() is used with explicit lower limit for the allocation it attempts to allocate memory at or above that limit and falls back to allocation with no limit set. The memblock_alloc_from_nopanic() does exactly the same thing and can be used as a replacement for __alloc_bootmem_nopanic() is such cases. Signed-off-by: Mike Rapoport Acked-by: Michal Hocko --- arch/arc/kernel/unwind.c | 4 ++-- arch/x86/kernel/setup_percpu.c | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/arch/arc/kernel/unwind.c b/arch/arc/kernel/unwind.c index 183391d..2a01dd1 100644 --- a/arch/arc/kernel/unwind.c +++ b/arch/arc/kernel/unwind.c @@ -181,8 +181,8 @@ static void init_unwind_hdr(struct unwind_table *table, */ static void *__init unw_hdr_alloc_early(unsigned long sz) { - return __alloc_bootmem_nopanic(sz, sizeof(unsigned int), - MAX_DMA_ADDRESS); + return memblock_alloc_from_nopanic(sz, sizeof(unsigned int), + MAX_DMA_ADDRESS); } static void *unw_hdr_alloc(unsigned long sz) diff --git a/arch/x86/kernel/setup_percpu.c b/arch/x86/kernel/setup_percpu.c index 67d48e26..041663a 100644 --- a/arch/x86/kernel/setup_percpu.c +++ b/arch/x86/kernel/setup_percpu.c @@ -106,7 +106,7 @@ static void * __init pcpu_alloc_bootmem(unsigned int cpu, unsigned long size, void *ptr; if (!node_online(node) || !NODE_DATA(node)) { - ptr = __alloc_bootmem_nopanic(size, align, goal); + ptr = memblock_alloc_from_nopanic(size, align, goal); pr_info("cpu %d has no node %d or node-local memory\n", cpu, node); pr_debug("per cpu data for cpu%d %lu bytes at %016lx\n", @@ -121,7 +121,7 @@ static void * __init pcpu_alloc_bootmem(unsigned int cpu, unsigned long size, } return ptr; #else - return __alloc_bootmem_nopanic(size, align, goal); + return memblock_alloc_from_nopanic(size, align, goal); #endif } -- 2.7.4
[PATCH 12/30] memblock: replace alloc_bootmem_low with memblock_alloc_low
The alloc_bootmem_low(size) allocates low memory with default alignement and can be replcaed by memblock_alloc_low(size, 0) Signed-off-by: Mike Rapoport Acked-by: Michal Hocko --- arch/arm64/kernel/setup.c | 2 +- arch/unicore32/kernel/setup.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c index 5b4fac4..cf7a7b7 100644 --- a/arch/arm64/kernel/setup.c +++ b/arch/arm64/kernel/setup.c @@ -213,7 +213,7 @@ static void __init request_standard_resources(void) kernel_data.end = __pa_symbol(_end - 1); for_each_memblock(memory, region) { - res = alloc_bootmem_low(sizeof(*res)); + res = memblock_alloc_low(sizeof(*res), 0); if (memblock_is_nomap(region)) { res->name = "reserved"; res->flags = IORESOURCE_MEM; diff --git a/arch/unicore32/kernel/setup.c b/arch/unicore32/kernel/setup.c index c2bffa5..9f163f9 100644 --- a/arch/unicore32/kernel/setup.c +++ b/arch/unicore32/kernel/setup.c @@ -207,7 +207,7 @@ request_standard_resources(struct meminfo *mi) if (mi->bank[i].size == 0) continue; - res = alloc_bootmem_low(sizeof(*res)); + res = memblock_alloc_low(sizeof(*res), 0); res->name = "System RAM"; res->start = mi->bank[i].start; res->end = mi->bank[i].start + mi->bank[i].size - 1; -- 2.7.4
[PATCH 11/30] memblock: replace alloc_bootmem_pages_nopanic with memblock_alloc_nopanic
The alloc_bootmem_pages_nopanic(size) is a shortcut for __alloc_bootmem_nopanic(size, PAGE_SIZE, BOOTMEM_LOW_LIMIT) which allocates PAGE_SIZE aligned memory. Since BOOTMEM_LOW_LIMIT is hardwired to 0 there is no restrictions on where the allocated memory should reside. The memblock_alloc_nopanic(size, PAGE_SIZE) also allocates PAGE_SIZE aligned memory without any restrictions and thus can be used as a replacement for alloc_bootmem_pages_nopanic() Signed-off-by: Mike Rapoport Acked-by: Michal Hocko --- drivers/usb/early/xhci-dbc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/usb/early/xhci-dbc.c b/drivers/usb/early/xhci-dbc.c index e15e896..16df968 100644 --- a/drivers/usb/early/xhci-dbc.c +++ b/drivers/usb/early/xhci-dbc.c @@ -94,7 +94,7 @@ static void * __init xdbc_get_page(dma_addr_t *dma_addr) { void *virt; - virt = alloc_bootmem_pages_nopanic(PAGE_SIZE); + virt = memblock_alloc_nopanic(PAGE_SIZE, PAGE_SIZE); if (!virt) return NULL; -- 2.7.4
[PATCH 10/30] memblock: replace __alloc_bootmem_node_nopanic with memblock_alloc_try_nid_nopanic
The __alloc_bootmem_node_nopanic() attempts to allocate memory for a specified node. If the allocation fails it then retries to allocate memory from any node. Upon success, the allocated memory is set to 0. The memblock_alloc_try_nid_nopanic() does exactly the same thing and can be used instead. Signed-off-by: Mike Rapoport Acked-by: Michal Hocko --- arch/x86/kernel/setup_percpu.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/arch/x86/kernel/setup_percpu.c b/arch/x86/kernel/setup_percpu.c index ea554f8..67d48e26 100644 --- a/arch/x86/kernel/setup_percpu.c +++ b/arch/x86/kernel/setup_percpu.c @@ -112,8 +112,10 @@ static void * __init pcpu_alloc_bootmem(unsigned int cpu, unsigned long size, pr_debug("per cpu data for cpu%d %lu bytes at %016lx\n", cpu, size, __pa(ptr)); } else { - ptr = __alloc_bootmem_node_nopanic(NODE_DATA(node), - size, align, goal); + ptr = memblock_alloc_try_nid_nopanic(size, align, goal, +BOOTMEM_ALLOC_ACCESSIBLE, +node); + pr_debug("per cpu data for cpu%d %lu bytes on node%d at %016lx\n", cpu, size, node, __pa(ptr)); } -- 2.7.4
[PATCH 08/30] memblock: replace alloc_bootmem_align with memblock_alloc
The functions are equivalent, just the later does not require nobootmem translation layer. Signed-off-by: Mike Rapoport Acked-by: Michal Hocko --- arch/x86/xen/p2m.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/xen/p2m.c b/arch/x86/xen/p2m.c index d6d74ef..5de761b 100644 --- a/arch/x86/xen/p2m.c +++ b/arch/x86/xen/p2m.c @@ -182,7 +182,7 @@ static void p2m_init_identity(unsigned long *p2m, unsigned long pfn) static void * __ref alloc_p2m_page(void) { if (unlikely(!slab_is_available())) - return alloc_bootmem_align(PAGE_SIZE, PAGE_SIZE); + return memblock_alloc(PAGE_SIZE, PAGE_SIZE); return (void *)__get_free_page(GFP_KERNEL); } -- 2.7.4
[PATCH 09/30] memblock: replace alloc_bootmem_low with memblock_alloc_low
The functions are equivalent, just the later does not require nobootmem translation layer. Signed-off-by: Mike Rapoport Acked-by: Michal Hocko --- arch/x86/kernel/tce_64.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/kernel/tce_64.c b/arch/x86/kernel/tce_64.c index f386bad..54c9b5a 100644 --- a/arch/x86/kernel/tce_64.c +++ b/arch/x86/kernel/tce_64.c @@ -173,7 +173,7 @@ void * __init alloc_tce_table(void) size = table_size_to_number_of_entries(specified_table_size); size *= TCE_ENTRY_SIZE; - return __alloc_bootmem_low(size, size, 0); + return memblock_alloc_low(size, size); } void __init free_tce_table(void *tbl) -- 2.7.4
[PATCH 07/30] memblock: remove _virt from APIs returning virtual address
The conversion is done using sed -i 's@memblock_virt_alloc@memblock_alloc@g' \ $(git grep -l memblock_virt_alloc) Signed-off-by: Mike Rapoport --- arch/arm/kernel/setup.c | 4 ++-- arch/arm/mach-omap2/omap_hwmod.c | 6 ++--- arch/arm64/mm/kasan_init.c| 2 +- arch/arm64/mm/numa.c | 2 +- arch/mips/kernel/setup.c | 2 +- arch/powerpc/kernel/pci_32.c | 2 +- arch/powerpc/lib/alloc.c | 2 +- arch/powerpc/mm/mmu_context_nohash.c | 6 ++--- arch/powerpc/platforms/powermac/nvram.c | 2 +- arch/powerpc/platforms/powernv/pci-ioda.c | 6 ++--- arch/powerpc/platforms/ps3/setup.c| 2 +- arch/powerpc/sysdev/msi_bitmap.c | 2 +- arch/s390/kernel/setup.c | 12 +- arch/s390/kernel/smp.c| 2 +- arch/s390/kernel/topology.c | 4 ++-- arch/s390/numa/mode_emu.c | 2 +- arch/s390/numa/toptree.c | 2 +- arch/x86/mm/kasan_init_64.c | 4 ++-- arch/xtensa/mm/kasan_init.c | 2 +- drivers/clk/ti/clk.c | 2 +- drivers/firmware/memmap.c | 2 +- drivers/of/fdt.c | 2 +- drivers/of/unittest.c | 2 +- include/linux/bootmem.h | 38 +++ init/main.c | 6 ++--- kernel/dma/swiotlb.c | 8 +++ kernel/power/snapshot.c | 2 +- kernel/printk/printk.c| 4 ++-- lib/cpumask.c | 2 +- mm/hugetlb.c | 2 +- mm/kasan/kasan_init.c | 2 +- mm/memblock.c | 26 ++--- mm/page_alloc.c | 8 +++ mm/page_ext.c | 2 +- mm/percpu.c | 28 +++ mm/sparse-vmemmap.c | 2 +- mm/sparse.c | 12 +- 37 files changed, 108 insertions(+), 108 deletions(-) diff --git a/arch/arm/kernel/setup.c b/arch/arm/kernel/setup.c index 4c249cb..39e6090 100644 --- a/arch/arm/kernel/setup.c +++ b/arch/arm/kernel/setup.c @@ -857,7 +857,7 @@ static void __init request_standard_resources(const struct machine_desc *mdesc) */ boot_alias_start = phys_to_idmap(start); if (arm_has_idmap_alias() && boot_alias_start != IDMAP_INVALID_ADDR) { - res = memblock_virt_alloc(sizeof(*res), 0); + res = memblock_alloc(sizeof(*res), 0); res->name = "System RAM (boot alias)"; res->start = boot_alias_start; res->end = phys_to_idmap(end); @@ -865,7 +865,7 @@ static void __init request_standard_resources(const struct machine_desc *mdesc) request_resource(_resource, res); } - res = memblock_virt_alloc(sizeof(*res), 0); + res = memblock_alloc(sizeof(*res), 0); res->name = "System RAM"; res->start = start; res->end = end; diff --git a/arch/arm/mach-omap2/omap_hwmod.c b/arch/arm/mach-omap2/omap_hwmod.c index 56a1fe9..1f9b34a 100644 --- a/arch/arm/mach-omap2/omap_hwmod.c +++ b/arch/arm/mach-omap2/omap_hwmod.c @@ -726,7 +726,7 @@ static int __init _setup_clkctrl_provider(struct device_node *np) u64 size; int i; - provider = memblock_virt_alloc(sizeof(*provider), 0); + provider = memblock_alloc(sizeof(*provider), 0); if (!provider) return -ENOMEM; @@ -736,12 +736,12 @@ static int __init _setup_clkctrl_provider(struct device_node *np) of_property_count_elems_of_size(np, "reg", sizeof(u32)) / 2; provider->addr = - memblock_virt_alloc(sizeof(void *) * provider->num_addrs, 0); + memblock_alloc(sizeof(void *) * provider->num_addrs, 0); if (!provider->addr) return -ENOMEM; provider->size = - memblock_virt_alloc(sizeof(u32) * provider->num_addrs, 0); + memblock_alloc(sizeof(u32) * provider->num_addrs, 0); if (!provider->size) return -ENOMEM; diff --git a/arch/arm64/mm/kasan_init.c b/arch/arm64/mm/kasan_init.c index 1214587..2391560 100644 --- a/arch/arm64/mm/kasan_init.c +++ b/arch/arm64/mm/kasan_init.c @@ -38,7 +38,7 @@ static pgd_t tmp_pg_dir[PTRS_PER_PGD] __initdata __aligned(PGD_SIZE); static phys_addr_t __init kasan_alloc_zeroed_page(int node) { - void *p = memblock_virt_alloc_try_nid(PAGE_SIZE, PAGE_SIZE, + void *p = memblock_alloc_try_nid(PAGE_SIZE, PAGE_SIZE, __pa(MAX_DMA_ADDRESS),
[PATCH 06/30] memblock: rename memblock_alloc{_nid, _try_nid} to memblock_phys_alloc*
Make it explicit that the caller gets a physical address rather than a virtual one. This will also allow using meblock_alloc prefix for memblock allocations returning virtual address, which is done in the following patches. The conversion is done using the following semantic patch: @@ expression e1, e2, e3; @@ ( - memblock_alloc(e1, e2) + memblock_phys_alloc(e1, e2) | - memblock_alloc_nid(e1, e2, e3) + memblock_phys_alloc_nid(e1, e2, e3) | - memblock_alloc_try_nid(e1, e2, e3) + memblock_phys_alloc_try_nid(e1, e2, e3) ) Signed-off-by: Mike Rapoport Acked-by: Michal Hocko --- arch/arm/mm/mmu.c | 2 +- arch/arm64/mm/mmu.c | 2 +- arch/arm64/mm/numa.c | 2 +- arch/c6x/mm/dma-coherent.c| 4 ++-- arch/nds32/mm/init.c | 8 arch/openrisc/mm/init.c | 2 +- arch/openrisc/mm/ioremap.c| 2 +- arch/powerpc/kernel/dt_cpu_ftrs.c | 4 +--- arch/powerpc/kernel/paca.c| 2 +- arch/powerpc/kernel/prom.c| 2 +- arch/powerpc/kernel/setup-common.c| 3 +-- arch/powerpc/kernel/setup_32.c| 10 +- arch/powerpc/mm/numa.c| 2 +- arch/powerpc/mm/pgtable_32.c | 2 +- arch/powerpc/mm/ppc_mmu_32.c | 2 +- arch/powerpc/platforms/pasemi/iommu.c | 2 +- arch/powerpc/platforms/powernv/opal.c | 2 +- arch/powerpc/sysdev/dart_iommu.c | 2 +- arch/s390/kernel/crash_dump.c | 2 +- arch/s390/kernel/setup.c | 3 ++- arch/s390/mm/vmem.c | 4 ++-- arch/s390/numa/numa.c | 2 +- arch/sparc/kernel/mdesc.c | 2 +- arch/sparc/kernel/prom_64.c | 2 +- arch/sparc/mm/init_64.c | 11 ++- arch/unicore32/mm/mmu.c | 2 +- arch/x86/mm/numa.c| 2 +- drivers/firmware/efi/memmap.c | 2 +- include/linux/memblock.h | 6 +++--- mm/memblock.c | 8 30 files changed, 50 insertions(+), 51 deletions(-) diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c index e46a6a4..f5cc1cc 100644 --- a/arch/arm/mm/mmu.c +++ b/arch/arm/mm/mmu.c @@ -721,7 +721,7 @@ EXPORT_SYMBOL(phys_mem_access_prot); static void __init *early_alloc_aligned(unsigned long sz, unsigned long align) { - void *ptr = __va(memblock_alloc(sz, align)); + void *ptr = __va(memblock_phys_alloc(sz, align)); memset(ptr, 0, sz); return ptr; } diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c index 8080c9f..b8e037b 100644 --- a/arch/arm64/mm/mmu.c +++ b/arch/arm64/mm/mmu.c @@ -83,7 +83,7 @@ static phys_addr_t __init early_pgtable_alloc(void) phys_addr_t phys; void *ptr; - phys = memblock_alloc(PAGE_SIZE, PAGE_SIZE); + phys = memblock_phys_alloc(PAGE_SIZE, PAGE_SIZE); /* * The FIX_{PGD,PUD,PMD} slots may be in active use, but the FIX_PTE diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c index 146c04c..e5aacd6 100644 --- a/arch/arm64/mm/numa.c +++ b/arch/arm64/mm/numa.c @@ -237,7 +237,7 @@ static void __init setup_node_data(int nid, u64 start_pfn, u64 end_pfn) if (start_pfn >= end_pfn) pr_info("Initmem setup node %d []\n", nid); - nd_pa = memblock_alloc_try_nid(nd_size, SMP_CACHE_BYTES, nid); + nd_pa = memblock_phys_alloc_try_nid(nd_size, SMP_CACHE_BYTES, nid); nd = __va(nd_pa); /* report and initialize */ diff --git a/arch/c6x/mm/dma-coherent.c b/arch/c6x/mm/dma-coherent.c index d0a8e0c..01305c7 100644 --- a/arch/c6x/mm/dma-coherent.c +++ b/arch/c6x/mm/dma-coherent.c @@ -135,8 +135,8 @@ void __init coherent_mem_init(phys_addr_t start, u32 size) if (dma_size & (PAGE_SIZE - 1)) ++dma_pages; - bitmap_phys = memblock_alloc(BITS_TO_LONGS(dma_pages) * sizeof(long), -sizeof(long)); + bitmap_phys = memblock_phys_alloc(BITS_TO_LONGS(dma_pages) * sizeof(long), + sizeof(long)); dma_bitmap = phys_to_virt(bitmap_phys); memset(dma_bitmap, 0, dma_pages * PAGE_SIZE); diff --git a/arch/nds32/mm/init.c b/arch/nds32/mm/init.c index c713d2a..5af81b8 100644 --- a/arch/nds32/mm/init.c +++ b/arch/nds32/mm/init.c @@ -81,7 +81,7 @@ static void __init map_ram(void) } /* Alloc one page for holding PTE's... */ - pte = (pte_t *) __va(memblock_alloc(PAGE_SIZE, PAGE_SIZE)); + pte = (pte_t *) __va(memblock_phys_alloc(PAGE_SIZE, PAGE_SIZE)); memset(pte, 0, PAGE_SIZE); set_pmd(pme, __pmd(__pa(pte) + _PAGE_KERNEL_TABLE)); @@ -114,7 +114,7 @@ static void __init fixedrange_init(void) pgd = swapper_pg_dir + pgd_index(vaddr); pud = pud_offset(pgd, vaddr); pmd = pmd_offset(pud, vaddr); - fixmap_pmd_p = (pmd_t *)
[PATCH 05/30] mm: nobootmem: remove dead code
Several bootmem functions and macros are not used. Remove them. Signed-off-by: Mike Rapoport --- include/linux/bootmem.h | 26 -- mm/nobootmem.c | 35 --- 2 files changed, 61 deletions(-) diff --git a/include/linux/bootmem.h b/include/linux/bootmem.h index fce6278..b74bafd1 100644 --- a/include/linux/bootmem.h +++ b/include/linux/bootmem.h @@ -36,17 +36,6 @@ extern void free_bootmem_node(pg_data_t *pgdat, extern void free_bootmem(unsigned long physaddr, unsigned long size); extern void free_bootmem_late(unsigned long physaddr, unsigned long size); -/* - * Flags for reserve_bootmem (also if CONFIG_HAVE_ARCH_BOOTMEM_NODE, - * the architecture-specific code should honor this). - * - * If flags is BOOTMEM_DEFAULT, then the return value is always 0 (success). - * If flags contains BOOTMEM_EXCLUSIVE, then -EBUSY is returned if the memory - * already was reserved. - */ -#define BOOTMEM_DEFAULT0 -#define BOOTMEM_EXCLUSIVE (1<<0) - extern void *__alloc_bootmem(unsigned long size, unsigned long align, unsigned long goal); @@ -73,13 +62,6 @@ void *___alloc_bootmem_node_nopanic(pg_data_t *pgdat, extern void *__alloc_bootmem_low(unsigned long size, unsigned long align, unsigned long goal) __malloc; -void *__alloc_bootmem_low_nopanic(unsigned long size, -unsigned long align, -unsigned long goal) __malloc; -extern void *__alloc_bootmem_low_node(pg_data_t *pgdat, - unsigned long size, - unsigned long align, - unsigned long goal) __malloc; /* We are using top down, so it is safe to use 0 here */ #define BOOTMEM_LOW_LIMIT 0 @@ -92,8 +74,6 @@ extern void *__alloc_bootmem_low_node(pg_data_t *pgdat, __alloc_bootmem(x, SMP_CACHE_BYTES, BOOTMEM_LOW_LIMIT) #define alloc_bootmem_align(x, align) \ __alloc_bootmem(x, align, BOOTMEM_LOW_LIMIT) -#define alloc_bootmem_nopanic(x) \ - __alloc_bootmem_nopanic(x, SMP_CACHE_BYTES, BOOTMEM_LOW_LIMIT) #define alloc_bootmem_pages(x) \ __alloc_bootmem(x, PAGE_SIZE, BOOTMEM_LOW_LIMIT) #define alloc_bootmem_pages_nopanic(x) \ @@ -104,17 +84,11 @@ extern void *__alloc_bootmem_low_node(pg_data_t *pgdat, __alloc_bootmem_node_nopanic(pgdat, x, SMP_CACHE_BYTES, BOOTMEM_LOW_LIMIT) #define alloc_bootmem_pages_node(pgdat, x) \ __alloc_bootmem_node(pgdat, x, PAGE_SIZE, BOOTMEM_LOW_LIMIT) -#define alloc_bootmem_pages_node_nopanic(pgdat, x) \ - __alloc_bootmem_node_nopanic(pgdat, x, PAGE_SIZE, BOOTMEM_LOW_LIMIT) #define alloc_bootmem_low(x) \ __alloc_bootmem_low(x, SMP_CACHE_BYTES, 0) -#define alloc_bootmem_low_pages_nopanic(x) \ - __alloc_bootmem_low_nopanic(x, PAGE_SIZE, 0) #define alloc_bootmem_low_pages(x) \ __alloc_bootmem_low(x, PAGE_SIZE, 0) -#define alloc_bootmem_low_pages_node(pgdat, x) \ - __alloc_bootmem_low_node(pgdat, x, PAGE_SIZE, 0) /* FIXME: use MEMBLOCK_ALLOC_* variants here */ #define BOOTMEM_ALLOC_ACCESSIBLE 0 diff --git a/mm/nobootmem.c b/mm/nobootmem.c index d4d0cd4..44ce7de 100644 --- a/mm/nobootmem.c +++ b/mm/nobootmem.c @@ -404,38 +404,3 @@ void * __init __alloc_bootmem_low(unsigned long size, unsigned long align, { return ___alloc_bootmem(size, align, goal, ARCH_LOW_ADDRESS_LIMIT); } - -void * __init __alloc_bootmem_low_nopanic(unsigned long size, - unsigned long align, - unsigned long goal) -{ - return ___alloc_bootmem_nopanic(size, align, goal, - ARCH_LOW_ADDRESS_LIMIT); -} - -/** - * __alloc_bootmem_low_node - allocate low boot memory from a specific node - * @pgdat: node to allocate from - * @size: size of the request in bytes - * @align: alignment of the region - * @goal: preferred starting address of the region - * - * The goal is dropped if it can not be satisfied and the allocation will - * fall back to memory below @goal. - * - * Allocation may fall back to any node in the system if the specified node - * can not hold the requested memory. - * - * The function panics if the request can not be satisfied. - * - * Return: address of the allocated region. - */ -void * __init __alloc_bootmem_low_node(pg_data_t *pgdat, unsigned long size, - unsigned long align, unsigned long goal) -{ - if (WARN_ON_ONCE(slab_is_available())) - return kzalloc_node(size, GFP_NOWAIT, pgdat->node_id); - - return ___alloc_bootmem_node(pgdat, size, align, goal, -ARCH_LOW_ADDRESS_LIMIT); -} -- 2.7.4
[PATCH 04/30] mm: remove bootmem allocator implementation.
All architectures have been converted to use MEMBLOCK + NO_BOOTMEM. The bootmem allocator implementation can be removed. Signed-off-by: Mike Rapoport Acked-by: Michal Hocko --- include/linux/bootmem.h | 16 - mm/bootmem.c| 811 2 files changed, 827 deletions(-) delete mode 100644 mm/bootmem.c diff --git a/include/linux/bootmem.h b/include/linux/bootmem.h index ee61ac3..fce6278 100644 --- a/include/linux/bootmem.h +++ b/include/linux/bootmem.h @@ -26,14 +26,6 @@ extern unsigned long max_pfn; */ extern unsigned long long max_possible_pfn; -extern unsigned long bootmem_bootmap_pages(unsigned long); - -extern unsigned long init_bootmem_node(pg_data_t *pgdat, - unsigned long freepfn, - unsigned long startpfn, - unsigned long endpfn); -extern unsigned long init_bootmem(unsigned long addr, unsigned long memend); - extern unsigned long free_all_bootmem(void); extern void reset_node_managed_pages(pg_data_t *pgdat); extern void reset_all_zones_managed_pages(void); @@ -55,14 +47,6 @@ extern void free_bootmem_late(unsigned long physaddr, unsigned long size); #define BOOTMEM_DEFAULT0 #define BOOTMEM_EXCLUSIVE (1<<0) -extern int reserve_bootmem(unsigned long addr, - unsigned long size, - int flags); -extern int reserve_bootmem_node(pg_data_t *pgdat, - unsigned long physaddr, - unsigned long size, - int flags); - extern void *__alloc_bootmem(unsigned long size, unsigned long align, unsigned long goal); diff --git a/mm/bootmem.c b/mm/bootmem.c deleted file mode 100644 index 97db0e8..000 --- a/mm/bootmem.c +++ /dev/null @@ -1,811 +0,0 @@ -// SPDX-License-Identifier: GPL-2.0 -/* - * bootmem - A boot-time physical memory allocator and configurator - * - * Copyright (C) 1999 Ingo Molnar - *1999 Kanoj Sarcar, SGI - *2008 Johannes Weiner - * - * Access to this subsystem has to be serialized externally (which is true - * for the boot process anyway). - */ -#include -#include -#include -#include -#include -#include -#include -#include -#include - -#include "internal.h" - -/** - * DOC: bootmem overview - * - * Bootmem is a boot-time physical memory allocator and configurator. - * - * It is used early in the boot process before the page allocator is - * set up. - * - * Bootmem is based on the most basic of allocators, a First Fit - * allocator which uses a bitmap to represent memory. If a bit is 1, - * the page is allocated and 0 if unallocated. To satisfy allocations - * of sizes smaller than a page, the allocator records the Page Frame - * Number (PFN) of the last allocation and the offset the allocation - * ended at. Subsequent small allocations are merged together and - * stored on the same page. - * - * The information used by the bootmem allocator is represented by - * :c:type:`struct bootmem_data`. An array to hold up to %MAX_NUMNODES - * such structures is statically allocated and then it is discarded - * when the system initialization completes. Each entry in this array - * corresponds to a node with memory. For UMA systems only entry 0 is - * used. - * - * The bootmem allocator is initialized during early architecture - * specific setup. Each architecture is required to supply a - * :c:func:`setup_arch` function which, among other tasks, is - * responsible for acquiring the necessary parameters to initialise - * the boot memory allocator. These parameters define limits of usable - * physical memory: - * - * * @min_low_pfn - the lowest PFN that is available in the system - * * @max_low_pfn - the highest PFN that may be addressed by low - * memory (%ZONE_NORMAL) - * * @max_pfn - the last PFN available to the system. - * - * After those limits are determined, the :c:func:`init_bootmem` or - * :c:func:`init_bootmem_node` function should be called to initialize - * the bootmem allocator. The UMA case should use the `init_bootmem` - * function. It will initialize ``contig_page_data`` structure that - * represents the only memory node in the system. In the NUMA case the - * `init_bootmem_node` function should be called to initialize the - * bootmem allocator for each node. - * - * Once the allocator is set up, it is possible to use either single - * node or NUMA variant of the allocation APIs. - */ - -#ifndef CONFIG_NEED_MULTIPLE_NODES -struct pglist_data __refdata contig_page_data = { - .bdata = _node_data[0] -}; -EXPORT_SYMBOL(contig_page_data); -#endif - -unsigned long max_low_pfn; -unsigned long min_low_pfn; -unsigned long max_pfn; -unsigned long long max_possible_pfn; - -bootmem_data_t bootmem_node_data[MAX_NUMNODES] __initdata; - -static struct list_head
[PATCH 03/30] mm: remove CONFIG_HAVE_MEMBLOCK
All architecures use memblock for early memory management. There is no need for the CONFIG_HAVE_MEMBLOCK configuration option. Signed-off-by: Mike Rapoport --- arch/alpha/Kconfig | 1 - arch/arc/Kconfig| 1 - arch/arm/Kconfig| 1 - arch/arm64/Kconfig | 1 - arch/c6x/Kconfig| 1 - arch/h8300/Kconfig | 1 - arch/hexagon/Kconfig| 1 - arch/ia64/Kconfig | 1 - arch/m68k/Kconfig | 1 - arch/microblaze/Kconfig | 1 - arch/mips/Kconfig | 1 - arch/nds32/Kconfig | 1 - arch/nios2/Kconfig | 1 - arch/openrisc/Kconfig | 1 - arch/parisc/Kconfig | 1 - arch/powerpc/Kconfig| 1 - arch/riscv/Kconfig | 1 - arch/s390/Kconfig | 1 - arch/sh/Kconfig | 1 - arch/sparc/Kconfig | 1 - arch/um/Kconfig | 1 - arch/unicore32/Kconfig | 1 - arch/x86/Kconfig| 1 - arch/xtensa/Kconfig | 1 - drivers/of/fdt.c| 2 - drivers/of/of_reserved_mem.c| 13 + drivers/staging/android/ion/Kconfig | 2 +- fs/pstore/Kconfig | 1 - include/linux/bootmem.h | 112 include/linux/memblock.h| 2 - include/linux/mm.h | 2 +- lib/Kconfig.debug | 3 +- mm/Kconfig | 5 +- mm/Makefile | 2 +- mm/nobootmem.c | 4 -- mm/page_alloc.c | 4 +- 36 files changed, 8 insertions(+), 168 deletions(-) diff --git a/arch/alpha/Kconfig b/arch/alpha/Kconfig index 04de6be..5b4f883 100644 --- a/arch/alpha/Kconfig +++ b/arch/alpha/Kconfig @@ -31,7 +31,6 @@ config ALPHA select ODD_RT_SIGACTION select OLD_SIGSUSPEND select CPU_NO_EFFICIENT_FFS if !ALPHA_EV67 - select HAVE_MEMBLOCK help The Alpha is a 64-bit general-purpose processor designed and marketed by the Digital Equipment Corporation of blessed memory, diff --git a/arch/arc/Kconfig b/arch/arc/Kconfig index 04ebead..5260440 100644 --- a/arch/arc/Kconfig +++ b/arch/arc/Kconfig @@ -37,7 +37,6 @@ config ARC select HAVE_KERNEL_LZMA select HAVE_KPROBES select HAVE_KRETPROBES - select HAVE_MEMBLOCK select HAVE_MOD_ARCH_SPECIFIC select HAVE_OPROFILE select HAVE_PERF_EVENTS diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index a961d70..33f4653 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -82,7 +82,6 @@ config ARM select HAVE_KERNEL_XZ select HAVE_KPROBES if !XIP_KERNEL && !CPU_ENDIAN_BE32 && !CPU_V7M select HAVE_KRETPROBES if (HAVE_KPROBES) - select HAVE_MEMBLOCK select HAVE_MOD_ARCH_SPECIFIC select HAVE_NMI select HAVE_OPROFILE if (HAVE_PERF_EVENTS) diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index 1795eaa..23ae619 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -134,7 +134,6 @@ config ARM64 select HAVE_GENERIC_DMA_COHERENT select HAVE_HW_BREAKPOINT if PERF_EVENTS select HAVE_IRQ_TIME_ACCOUNTING - select HAVE_MEMBLOCK select HAVE_MEMBLOCK_NODE_MAP if NUMA select HAVE_NMI select HAVE_PATA_PLATFORM diff --git a/arch/c6x/Kconfig b/arch/c6x/Kconfig index a641b0b..833fdb0 100644 --- a/arch/c6x/Kconfig +++ b/arch/c6x/Kconfig @@ -13,7 +13,6 @@ config C6X select GENERIC_ATOMIC64 select GENERIC_IRQ_SHOW select HAVE_ARCH_TRACEHOOK - select HAVE_MEMBLOCK select SPARSE_IRQ select IRQ_DOMAIN select OF diff --git a/arch/h8300/Kconfig b/arch/h8300/Kconfig index 5e89d40..d19c6b16 100644 --- a/arch/h8300/Kconfig +++ b/arch/h8300/Kconfig @@ -15,7 +15,6 @@ config H8300 select OF select OF_IRQ select OF_EARLY_FLATTREE - select HAVE_MEMBLOCK select TIMER_OF select H8300_TMR8 select HAVE_KERNEL_GZIP diff --git a/arch/hexagon/Kconfig b/arch/hexagon/Kconfig index 24a6da9..d9ae82b 100644 --- a/arch/hexagon/Kconfig +++ b/arch/hexagon/Kconfig @@ -31,7 +31,6 @@ config HEXAGON select GENERIC_CLOCKEVENTS_BROADCAST select MODULES_USE_ELF_RELA select GENERIC_CPU_DEVICES - select HAVE_MEMBLOCK select ARCH_DISCARD_MEMBLOCK ---help--- Qualcomm Hexagon is a processor architecture designed for high diff --git a/arch/ia64/Kconfig b/arch/ia64/Kconfig index 2bf4ef7..36773de 100644 --- a/arch/ia64/Kconfig +++ b/arch/ia64/Kconfig @@ -26,7 +26,6 @@ config IA64 select HAVE_FUNCTION_TRACER select TTY select HAVE_ARCH_TRACEHOOK -
[PATCH 02/30] mm: remove CONFIG_NO_BOOTMEM
All achitectures select NO_BOOTMEM which essentially becomes 'Y' for any kernel configuration and therefore it can be removed. Signed-off-by: Mike Rapoport --- arch/alpha/Kconfig | 1 - arch/arc/Kconfig| 1 - arch/arm/Kconfig| 1 - arch/arm64/Kconfig | 1 - arch/c6x/Kconfig| 1 - arch/h8300/Kconfig | 1 - arch/hexagon/Kconfig| 1 - arch/ia64/Kconfig | 1 - arch/m68k/Kconfig | 1 - arch/microblaze/Kconfig | 1 - arch/mips/Kconfig | 1 - arch/nds32/Kconfig | 1 - arch/nios2/Kconfig | 1 - arch/openrisc/Kconfig | 1 - arch/parisc/Kconfig | 1 - arch/powerpc/Kconfig| 1 - arch/riscv/Kconfig | 1 - arch/s390/Kconfig | 1 - arch/sh/Kconfig | 1 - arch/sparc/Kconfig | 1 - arch/um/Kconfig | 1 - arch/unicore32/Kconfig | 1 - arch/x86/Kconfig| 3 --- arch/xtensa/Kconfig | 1 - include/linux/bootmem.h | 36 ++-- include/linux/mmzone.h | 5 + mm/Kconfig | 3 --- mm/Makefile | 7 +-- mm/memblock.c | 2 -- 29 files changed, 4 insertions(+), 75 deletions(-) diff --git a/arch/alpha/Kconfig b/arch/alpha/Kconfig index 620b0a7..04de6be 100644 --- a/arch/alpha/Kconfig +++ b/arch/alpha/Kconfig @@ -32,7 +32,6 @@ config ALPHA select OLD_SIGSUSPEND select CPU_NO_EFFICIENT_FFS if !ALPHA_EV67 select HAVE_MEMBLOCK - select NO_BOOTMEM help The Alpha is a 64-bit general-purpose processor designed and marketed by the Digital Equipment Corporation of blessed memory, diff --git a/arch/arc/Kconfig b/arch/arc/Kconfig index b4441b0..04ebead 100644 --- a/arch/arc/Kconfig +++ b/arch/arc/Kconfig @@ -44,7 +44,6 @@ config ARC select HANDLE_DOMAIN_IRQ select IRQ_DOMAIN select MODULES_USE_ELF_RELA - select NO_BOOTMEM select OF select OF_EARLY_FLATTREE select OF_RESERVED_MEM diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index 4607d32..a961d70 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -100,7 +100,6 @@ config ARM select IRQ_FORCED_THREADING select MODULES_USE_ELF_REL select NEED_DMA_MAP_STATE - select NO_BOOTMEM select OF_EARLY_FLATTREE if OF select OF_RESERVED_MEM if OF select OLD_SIGACTION diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index 0128d84..1795eaa 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -156,7 +156,6 @@ config ARM64 select MULTI_IRQ_HANDLER select NEED_DMA_MAP_STATE select NEED_SG_DMA_LENGTH - select NO_BOOTMEM select OF select OF_EARLY_FLATTREE select OF_RESERVED_MEM diff --git a/arch/c6x/Kconfig b/arch/c6x/Kconfig index 85ed568..a641b0b 100644 --- a/arch/c6x/Kconfig +++ b/arch/c6x/Kconfig @@ -14,7 +14,6 @@ config C6X select GENERIC_IRQ_SHOW select HAVE_ARCH_TRACEHOOK select HAVE_MEMBLOCK - select NO_BOOTMEM select SPARSE_IRQ select IRQ_DOMAIN select OF diff --git a/arch/h8300/Kconfig b/arch/h8300/Kconfig index 0b334b6..5e89d40 100644 --- a/arch/h8300/Kconfig +++ b/arch/h8300/Kconfig @@ -16,7 +16,6 @@ config H8300 select OF_IRQ select OF_EARLY_FLATTREE select HAVE_MEMBLOCK - select NO_BOOTMEM select TIMER_OF select H8300_TMR8 select HAVE_KERNEL_GZIP diff --git a/arch/hexagon/Kconfig b/arch/hexagon/Kconfig index 3ba6873..24a6da9 100644 --- a/arch/hexagon/Kconfig +++ b/arch/hexagon/Kconfig @@ -33,7 +33,6 @@ config HEXAGON select GENERIC_CPU_DEVICES select HAVE_MEMBLOCK select ARCH_DISCARD_MEMBLOCK - select NO_BOOTMEM ---help--- Qualcomm Hexagon is a processor architecture designed for high performance and low power across a wide variety of applications. diff --git a/arch/ia64/Kconfig b/arch/ia64/Kconfig index 8b4a0c17..2bf4ef7 100644 --- a/arch/ia64/Kconfig +++ b/arch/ia64/Kconfig @@ -28,7 +28,6 @@ config IA64 select HAVE_ARCH_TRACEHOOK select HAVE_MEMBLOCK select HAVE_MEMBLOCK_NODE_MAP - select NO_BOOTMEM select HAVE_VIRT_CPU_ACCOUNTING select ARCH_HAS_DMA_MARK_CLEAN select ARCH_HAS_SG_CHAIN diff --git a/arch/m68k/Kconfig b/arch/m68k/Kconfig index 0705537..8c7111d 100644 --- a/arch/m68k/Kconfig +++ b/arch/m68k/Kconfig @@ -29,7 +29,6 @@ config M68K select DMA_NONCOHERENT_OPS if HAS_DMA select HAVE_MEMBLOCK select ARCH_DISCARD_MEMBLOCK - select NO_BOOTMEM config CPU_BIG_ENDIAN def_bool y diff --git a/arch/microblaze/Kconfig b/arch/microblaze/Kconfig index ace5c5b..56379b9 100644 --- a/arch/microblaze/Kconfig +++ b/arch/microblaze/Kconfig @@ -28,7 +28,6 @@ config MICROBLAZE select HAVE_FTRACE_MCOUNT_RECORD select HAVE_FUNCTION_GRAPH_TRACER select HAVE_FUNCTION_TRACER -
[PATCH 01/30] mips: switch to NO_BOOTMEM
MIPS already has memblock support and all the memory is already registered with it. This patch replaces bootmem memory reservations with memblock ones and removes the bootmem initialization. Since memblock allocates memory in top-down mode, we ensure that memblock limit is max_low_pfn to prevent allocations from the high memory. To have the exceptions base in the lower 512M of the physical memory, its allocation in arch/mips/kernel/traps.c::traps_init() is using bottom-up mode. Signed-off-by: Mike Rapoport --- arch/mips/Kconfig | 1 + arch/mips/kernel/setup.c | 99 -- arch/mips/kernel/traps.c | 3 ++ arch/mips/loongson64/loongson-3/numa.c | 34 ++-- arch/mips/sgi-ip27/ip27-memory.c | 11 ++-- 5 files changed, 46 insertions(+), 102 deletions(-) diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig index 54532f2..1b5fa1a 100644 --- a/arch/mips/Kconfig +++ b/arch/mips/Kconfig @@ -78,6 +78,7 @@ config MIPS select RTC_LIB select SYSCTL_EXCEPTION_TRACE select VIRT_TO_BUS + select NO_BOOTMEM menu "Machine selection" diff --git a/arch/mips/kernel/setup.c b/arch/mips/kernel/setup.c index 32fc11d..2fde53e 100644 --- a/arch/mips/kernel/setup.c +++ b/arch/mips/kernel/setup.c @@ -333,7 +333,7 @@ static void __init finalize_initrd(void) maybe_bswap_initrd(); - reserve_bootmem(__pa(initrd_start), size, BOOTMEM_DEFAULT); + memblock_reserve(__pa(initrd_start), size); initrd_below_start_ok = 1; pr_info("Initial ramdisk at: 0x%lx (%lu bytes)\n", @@ -370,20 +370,10 @@ static void __init bootmem_init(void) #else /* !CONFIG_SGI_IP27 */ -static unsigned long __init bootmap_bytes(unsigned long pages) -{ - unsigned long bytes = DIV_ROUND_UP(pages, 8); - - return ALIGN(bytes, sizeof(long)); -} - static void __init bootmem_init(void) { unsigned long reserved_end; - unsigned long mapstart = ~0UL; - unsigned long bootmap_size; phys_addr_t ramstart = PHYS_ADDR_MAX; - bool bootmap_valid = false; int i; /* @@ -395,6 +385,8 @@ static void __init bootmem_init(void) init_initrd(); reserved_end = (unsigned long) PFN_UP(__pa_symbol(&_end)); + memblock_reserve(PHYS_OFFSET, reserved_end << PAGE_SHIFT); + /* * max_low_pfn is not a number of pages. The number of pages * of the system is given by 'max_low_pfn - min_low_pfn'. @@ -442,9 +434,6 @@ static void __init bootmem_init(void) if (initrd_end && end <= (unsigned long)PFN_UP(__pa(initrd_end))) continue; #endif - if (start >= mapstart) - continue; - mapstart = max(reserved_end, start); } if (min_low_pfn >= max_low_pfn) @@ -456,9 +445,11 @@ static void __init bootmem_init(void) /* * Reserve any memory between the start of RAM and PHYS_OFFSET */ - if (ramstart > PHYS_OFFSET) + if (ramstart > PHYS_OFFSET) { add_memory_region(PHYS_OFFSET, ramstart - PHYS_OFFSET, BOOT_MEM_RESERVED); + memblock_reserve(PHYS_OFFSET, ramstart - PHYS_OFFSET); + } if (min_low_pfn > ARCH_PFN_OFFSET) { pr_info("Wasting %lu bytes for tracking %lu unused pages\n", @@ -483,52 +474,6 @@ static void __init bootmem_init(void) max_low_pfn = PFN_DOWN(HIGHMEM_START); } -#ifdef CONFIG_BLK_DEV_INITRD - /* -* mapstart should be after initrd_end -*/ - if (initrd_end) - mapstart = max(mapstart, (unsigned long)PFN_UP(__pa(initrd_end))); -#endif - - /* -* check that mapstart doesn't overlap with any of -* memory regions that have been reserved through eg. DTB -*/ - bootmap_size = bootmap_bytes(max_low_pfn - min_low_pfn); - - bootmap_valid = memory_region_available(PFN_PHYS(mapstart), - bootmap_size); - for (i = 0; i < boot_mem_map.nr_map && !bootmap_valid; i++) { - unsigned long mapstart_addr; - - switch (boot_mem_map.map[i].type) { - case BOOT_MEM_RESERVED: - mapstart_addr = PFN_ALIGN(boot_mem_map.map[i].addr + - boot_mem_map.map[i].size); - if (PHYS_PFN(mapstart_addr) < mapstart) - break; - - bootmap_valid = memory_region_available(mapstart_addr, - bootmap_size); - if (bootmap_valid) - mapstart = PHYS_PFN(mapstart_addr); - break; - default: - break; - } - } - - if
[PATCH 00/30] mm: remove bootmem allocator
Hi, These patches switch early memory management to use memblock directly without any bootmem compatibility wrappers. As the result both bootmem and nobootmem are removed. The patchset survived allyesconfig builds on arm, arm64, i386, mips, nds32, parisc, powerpc, riscv, s390 and x86 and most of the *_defconfig builds for all architectures except unicore32. The patchset is based on v4.19-rc3-mmotm-2018-09-12-16-40, so I needed a small PSI fix from [1] for some of the builds. I did my best to verify that the failures are not caused by my changes, but I may have missed something. Most defconfig build failures I've seen were caused by assembler being unhappy about unsupported opcode, wrong encoding or something else. Some builds for allyesconfig also failed because of it and others failed because of symbol mismatch in spi-sprd or n_hdlc. I've done boot testing on real x86-64 and Power8 machines and on qemu-system-alpha and qemu-system-mips64el VMs. I've tried to keep the distribution list as small as possible, but it's still pretty log; my apologies for spamming. Changes since RFC: * updated MIPS conversion to nobootmem: - set memblock limit to max_low_pfn to avoid allocation attempts from high memory - use boottom-up mode for allocation of the exceptions base * added elaborate changelogs * updated boot-time-mm documentation [1] https://lkml.org/lkml/2018/9/13/88 Mike Rapoport (30): mips: switch to NO_BOOTMEM mm: remove CONFIG_NO_BOOTMEM mm: remove CONFIG_HAVE_MEMBLOCK mm: remove bootmem allocator implementation. mm: nobootmem: remove dead code memblock: rename memblock_alloc{_nid,_try_nid} to memblock_phys_alloc* memblock: remove _virt from APIs returning virtual address memblock: replace alloc_bootmem_align with memblock_alloc memblock: replace alloc_bootmem_low with memblock_alloc_low memblock: replace __alloc_bootmem_node_nopanic with memblock_alloc_try_nid_nopanic memblock: replace alloc_bootmem_pages_nopanic with memblock_alloc_nopanic memblock: replace alloc_bootmem_low with memblock_alloc_low memblock: replace __alloc_bootmem_nopanic with memblock_alloc_from_nopanic memblock: add align parameter to memblock_alloc_node() memblock: replace alloc_bootmem_pages_node with memblock_alloc_node memblock: replace __alloc_bootmem_node with appropriate memblock_ API memblock: replace alloc_bootmem_node with memblock_alloc_node memblock: replace alloc_bootmem_low_pages with memblock_alloc_low memblock: replace alloc_bootmem_pages with memblock_alloc memblock: replace __alloc_bootmem with memblock_alloc_from memblock: replace alloc_bootmem with memblock_alloc mm: nobootmem: remove bootmem allocation APIs memblock: replace free_bootmem{_node} with memblock_free memblock: replace free_bootmem_late with memblock_free_late memblock: rename free_all_bootmem to memblock_free_all memblock: rename __free_pages_bootmem to memblock_free_pages mm: remove nobootmem memblock: replace BOOTMEM_ALLOC_* with MEMBLOCK variants mm: remove include/linux/bootmem.h docs/boot-time-mm: remove bootmem documentation Documentation/core-api/boot-time-mm.rst | 71 +-- arch/alpha/Kconfig | 2 - arch/alpha/kernel/core_cia.c| 4 +- arch/alpha/kernel/core_irongate.c | 4 +- arch/alpha/kernel/core_marvel.c | 6 +- arch/alpha/kernel/core_titan.c | 2 +- arch/alpha/kernel/core_tsunami.c| 2 +- arch/alpha/kernel/pci-noop.c| 6 +- arch/alpha/kernel/pci.c | 6 +- arch/alpha/kernel/pci_iommu.c | 14 +- arch/alpha/kernel/setup.c | 3 +- arch/alpha/kernel/sys_nautilus.c| 2 +- arch/alpha/mm/init.c| 4 +- arch/alpha/mm/numa.c| 1 - arch/arc/Kconfig| 2 - arch/arc/kernel/unwind.c| 6 +- arch/arc/mm/highmem.c | 4 +- arch/arc/mm/init.c | 3 +- arch/arm/Kconfig| 2 - arch/arm/kernel/devtree.c | 1 - arch/arm/kernel/setup.c | 5 +- arch/arm/mach-omap2/omap_hwmod.c| 8 +- arch/arm/mm/dma-mapping.c | 1 - arch/arm/mm/init.c | 3 +- arch/arm/mm/mmu.c | 2 +- arch/arm/xen/mm.c | 1 - arch/arm/xen/p2m.c | 2 +- arch/arm64/Kconfig | 2 - arch/arm64/kernel/acpi.c| 1 - arch/arm64/kernel/acpi_numa.c | 1 - arch/arm64/kernel/setup.c | 3 +- arch/arm64/mm/dma-mapping.c | 2 +- arch/arm64/mm/init.c| 5 +- arch/arm64/mm/kasan_init.c | 3 +- arch/arm64/mm/mmu.c
[PATCH] serial: cpm_uart: return immediately from console poll
kgdb expects poll function to return immediately and returning NO_POLL_CHAR when no character is available. Fixes: f5316b4aea024 ("kgdb,8250,pl011: Return immediately from console poll") Cc: Jason Wessel Cc: Signed-off-by: Christophe Leroy --- drivers/tty/serial/cpm_uart/cpm_uart_core.c | 10 +++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/drivers/tty/serial/cpm_uart/cpm_uart_core.c b/drivers/tty/serial/cpm_uart/cpm_uart_core.c index cd3f3fc4e0a5..280acc4dfa90 100644 --- a/drivers/tty/serial/cpm_uart/cpm_uart_core.c +++ b/drivers/tty/serial/cpm_uart/cpm_uart_core.c @@ -1093,8 +1093,8 @@ static int poll_wait_key(char *obuf, struct uart_cpm_port *pinfo) /* Get the address of the host memory buffer. */ bdp = pinfo->rx_cur; - while (bdp->cbd_sc & BD_SC_EMPTY) - ; + if (bdp->cbd_sc & BD_SC_EMPTY) + return NO_POLL_CHAR; /* If the buffer address is in the CPM DPRAM, don't * convert it. @@ -1129,7 +1129,11 @@ static int cpm_get_poll_char(struct uart_port *port) poll_chars = 0; } if (poll_chars <= 0) { - poll_chars = poll_wait_key(poll_buf, pinfo); + int ret = poll_wait_key(poll_buf, pinfo); + + if (ret == NO_POLL_CHAR) + return ret; + poll_chars = ret; pollp = poll_buf; } poll_chars--; -- 2.13.3
Re: [PATCH 2/3] powerpc: Add system call table generation support
On Fri, Sep 14, 2018 at 10:33 AM Firoz Khan wrote: > --- > arch/powerpc/kernel/syscalls/Makefile | 51 > arch/powerpc/kernel/syscalls/syscall_32.tbl | 378 > > arch/powerpc/kernel/syscalls/syscall_64.tbl | 372 +++ > arch/powerpc/kernel/syscalls/syscallhdr.sh | 37 +++ > arch/powerpc/kernel/syscalls/syscalltbl.sh | 38 +++ I think you should only need a single .tbl input file here. > + > +systbl_abi_syscall_table_32 := 32 > +$(out)/syscall_table_32.h: $(syscall32) $(systbl) > + $(call if_changed,systbl) > + > +systbl_abi_syscall_table_64 := 64 > +$(out)/syscall_table_64.h: $(syscall64) $(systbl) > + $(call if_changed,systbl) > + > +systbl_abi_syscall_table_c32 := c32 > +$(out)/syscall_table_c32.h: $(syscall32) $(systbl) > + $(call if_changed,systbl) And here you need a fourth output file for the SPU table on ppc64. > +383 common statx sys_statx > +384 common pkey_alloc sys_pkey_alloc > +385 common pkey_free sys_pkey_free > +386 common pkey_mprotect sys_pkey_mprotect This also misses rseq and io_pgetevents. Arnd
[PATCH v2 5/5] arm64: dts: add LX2160ARDB board support
LX2160A reference design board (RDB) is a high-performance computing, evaluation, and development platform with LX2160A SoC. Signed-off-by: Priyanka Jain Signed-off-by: Sriram Dash Signed-off-by: Vabhav Sharma --- arch/arm64/boot/dts/freescale/Makefile| 1 + arch/arm64/boot/dts/freescale/fsl-lx2160a-rdb.dts | 88 +++ 2 files changed, 89 insertions(+) create mode 100644 arch/arm64/boot/dts/freescale/fsl-lx2160a-rdb.dts diff --git a/arch/arm64/boot/dts/freescale/Makefile b/arch/arm64/boot/dts/freescale/Makefile index 86e18ad..445b72b 100644 --- a/arch/arm64/boot/dts/freescale/Makefile +++ b/arch/arm64/boot/dts/freescale/Makefile @@ -13,3 +13,4 @@ dtb-$(CONFIG_ARCH_LAYERSCAPE) += fsl-ls2080a-rdb.dtb dtb-$(CONFIG_ARCH_LAYERSCAPE) += fsl-ls2080a-simu.dtb dtb-$(CONFIG_ARCH_LAYERSCAPE) += fsl-ls2088a-qds.dtb dtb-$(CONFIG_ARCH_LAYERSCAPE) += fsl-ls2088a-rdb.dtb +dtb-$(CONFIG_ARCH_LAYERSCAPE) += fsl-lx2160a-rdb.dtb diff --git a/arch/arm64/boot/dts/freescale/fsl-lx2160a-rdb.dts b/arch/arm64/boot/dts/freescale/fsl-lx2160a-rdb.dts new file mode 100644 index 000..1bbe663 --- /dev/null +++ b/arch/arm64/boot/dts/freescale/fsl-lx2160a-rdb.dts @@ -0,0 +1,88 @@ +// SPDX-License-Identifier: (GPL-2.0 OR MIT) +// +// Device Tree file for LX2160ARDB +// +// Copyright 2018 NXP + +/dts-v1/; + +#include "fsl-lx2160a.dtsi" + +/ { + model = "NXP Layerscape LX2160ARDB"; + compatible = "fsl,lx2160a-rdb", "fsl,lx2160a"; + + chosen { + stdout-path = "serial0:115200n8"; + }; +}; + + { + status = "okay"; +}; + + { + status = "okay"; +}; + + { + status = "okay"; + i2c-mux@77 { + compatible = "nxp,pca9547"; + reg = <0x77>; + #address-cells = <1>; + #size-cells = <0>; + + i2c@2 { + #address-cells = <1>; + #size-cells = <0>; + reg = <0x2>; + + power-monitor@40 { + compatible = "ti,ina220"; + reg = <0x40>; + shunt-resistor = <1000>; + }; + }; + + i2c@3 { + #address-cells = <1>; + #size-cells = <0>; + reg = <0x3>; + + temperature-sensor@4c { + compatible = "nxp,sa56004"; + reg = <0x4c>; + }; + + temperature-sensor@4d { + compatible = "nxp,sa56004"; + reg = <0x4d>; + }; + }; + }; +}; + + { + status = "okay"; + + rtc@51 { + compatible = "nxp,pcf2129"; + reg = <0x51>; + // IRQ10_B + interrupts = <0 150 0x4>; + }; + +}; + + { + status = "okay"; +}; + + { + status = "okay"; +}; + + { + status = "okay"; +}; -- 2.7.4
[PATCH v2 4/5] arm64: dts: add QorIQ LX2160A SoC support
LX2160A SoC is based on Layerscape Chassis Generation 3.2 Architecture. LX2160A features an advanced 16 64-bit ARM v8 CortexA72 processor cores in 8 cluster, CCN508, GICv3,two 64-bit DDR4 memory controller, 8 I2C controllers, 3 dspi, 2 esdhc,2 USB 3.0, mmu 500, 3 SATA, 4 PL011 SBSA UARTs etc. Signed-off-by: Ramneek Mehresh Signed-off-by: Zhang Ying-22455 Signed-off-by: Nipun Gupta Signed-off-by: Priyanka Jain Signed-off-by: Yogesh Gaur Signed-off-by: Sriram Dash Signed-off-by: Vabhav Sharma --- arch/arm64/boot/dts/freescale/fsl-lx2160a.dtsi | 693 + 1 file changed, 693 insertions(+) create mode 100644 arch/arm64/boot/dts/freescale/fsl-lx2160a.dtsi diff --git a/arch/arm64/boot/dts/freescale/fsl-lx2160a.dtsi b/arch/arm64/boot/dts/freescale/fsl-lx2160a.dtsi new file mode 100644 index 000..46eea16 --- /dev/null +++ b/arch/arm64/boot/dts/freescale/fsl-lx2160a.dtsi @@ -0,0 +1,693 @@ +// SPDX-License-Identifier: (GPL-2.0 OR MIT) +// +// Device Tree Include file for Layerscape-LX2160A family SoC. +// +// Copyright 2018 NXP + +#include + +/memreserve/ 0x8000 0x0001; + +/ { + compatible = "fsl,lx2160a"; + interrupt-parent = <>; + #address-cells = <2>; + #size-cells = <2>; + + cpus { + #address-cells = <1>; + #size-cells = <0>; + + // 8 clusters having 2 Cortex-A72 cores each + cpu@0 { + device_type = "cpu"; + compatible = "arm,cortex-a72"; + reg = <0x0>; + clocks = < 1 0>; + d-cache-size = <0x8000>; + d-cache-line-size = <64>; + d-cache-sets = <128>; + i-cache-size = <0xC000>; + i-cache-line-size = <64>; + i-cache-sets = <192>; + next-level-cache = <_l2>; + }; + + cpu@1 { + device_type = "cpu"; + compatible = "arm,cortex-a72"; + reg = <0x1>; + clocks = < 1 0>; + d-cache-size = <0x8000>; + d-cache-line-size = <64>; + d-cache-sets = <128>; + i-cache-size = <0xC000>; + i-cache-line-size = <64>; + i-cache-sets = <192>; + next-level-cache = <_l2>; + }; + + cpu@100 { + device_type = "cpu"; + compatible = "arm,cortex-a72"; + reg = <0x100>; + clocks = < 1 1>; + d-cache-size = <0x8000>; + d-cache-line-size = <64>; + d-cache-sets = <128>; + i-cache-size = <0xC000>; + i-cache-line-size = <64>; + i-cache-sets = <192>; + next-level-cache = <_l2>; + }; + + cpu@101 { + device_type = "cpu"; + compatible = "arm,cortex-a72"; + reg = <0x101>; + clocks = < 1 1>; + d-cache-size = <0x8000>; + d-cache-line-size = <64>; + d-cache-sets = <128>; + i-cache-size = <0xC000>; + i-cache-line-size = <64>; + i-cache-sets = <192>; + next-level-cache = <_l2>; + }; + + cpu@200 { + device_type = "cpu"; + compatible = "arm,cortex-a72"; + reg = <0x200>; + clocks = < 1 2>; + d-cache-size = <0x8000>; + d-cache-line-size = <64>; + d-cache-sets = <128>; + i-cache-size = <0xC000>; + i-cache-line-size = <64>; + i-cache-sets = <192>; + next-level-cache = <_l2>; + }; + + cpu@201 { + device_type = "cpu"; + compatible = "arm,cortex-a72"; + reg = <0x201>; + clocks = < 1 2>; + d-cache-size = <0x8000>; + d-cache-line-size = <64>; + d-cache-sets = <128>; + i-cache-size = <0xC000>; + i-cache-line-size = <64>; + i-cache-sets = <192>; + next-level-cache = <_l2>; + }; + + cpu@300 { + device_type = "cpu"; + compatible = "arm,cortex-a72"; + reg = <0x300>; + clocks
[PATCH v2 3/5] drivers: clk-qoriq: Add clockgen support for lx2160a
From: Yogesh Gaur Add clockgen support for lx2160a. Added entry for compat 'fsl,lx2160a-clockgen'. Signed-off-by: Tang Yuantian Signed-off-by: Yogesh Gaur Signed-off-by: Vabhav Sharma Acked-by: Stephen Boyd --- drivers/clk/clk-qoriq.c | 14 +- drivers/cpufreq/qoriq-cpufreq.c | 1 + 2 files changed, 14 insertions(+), 1 deletion(-) diff --git a/drivers/clk/clk-qoriq.c b/drivers/clk/clk-qoriq.c index 3a1812f..e9ae70b 100644 --- a/drivers/clk/clk-qoriq.c +++ b/drivers/clk/clk-qoriq.c @@ -79,7 +79,7 @@ struct clockgen_chipinfo { const struct clockgen_muxinfo *cmux_groups[2]; const struct clockgen_muxinfo *hwaccel[NUM_HWACCEL]; void (*init_periph)(struct clockgen *cg); - int cmux_to_group[NUM_CMUX]; /* -1 terminates if fewer than NUM_CMUX */ + int cmux_to_group[NUM_CMUX+1]; /* -1 terminate if fewer to NUM_CMUX+1 */ u32 pll_mask; /* 1 << n bit set if PLL n is valid */ u32 flags; /* CG_xxx */ }; @@ -570,6 +570,17 @@ static const struct clockgen_chipinfo chipinfo[] = { .flags = CG_VER3 | CG_LITTLE_ENDIAN, }, { + .compat = "fsl,lx2160a-clockgen", + .cmux_groups = { + _cmux_cga12, _cmux_cgb + }, + .cmux_to_group = { + 0, 0, 0, 0, 1, 1, 1, 1, -1 + }, + .pll_mask = 0x37, + .flags = CG_VER3 | CG_LITTLE_ENDIAN, + }, + { .compat = "fsl,p2041-clockgen", .guts_compat = "fsl,qoriq-device-config-1.0", .init_periph = p2041_init_periph, @@ -1424,6 +1435,7 @@ CLK_OF_DECLARE(qoriq_clockgen_ls1043a, "fsl,ls1043a-clockgen", clockgen_init); CLK_OF_DECLARE(qoriq_clockgen_ls1046a, "fsl,ls1046a-clockgen", clockgen_init); CLK_OF_DECLARE(qoriq_clockgen_ls1088a, "fsl,ls1088a-clockgen", clockgen_init); CLK_OF_DECLARE(qoriq_clockgen_ls2080a, "fsl,ls2080a-clockgen", clockgen_init); +CLK_OF_DECLARE(qoriq_clockgen_lx2160a, "fsl,lx2160a-clockgen", clockgen_init); /* Legacy nodes */ CLK_OF_DECLARE(qoriq_sysclk_1, "fsl,qoriq-sysclk-1.0", sysclk_init); diff --git a/drivers/cpufreq/qoriq-cpufreq.c b/drivers/cpufreq/qoriq-cpufreq.c index 3d773f6..83921b7 100644 --- a/drivers/cpufreq/qoriq-cpufreq.c +++ b/drivers/cpufreq/qoriq-cpufreq.c @@ -295,6 +295,7 @@ static const struct of_device_id node_matches[] __initconst = { { .compatible = "fsl,ls1046a-clockgen", }, { .compatible = "fsl,ls1088a-clockgen", }, { .compatible = "fsl,ls2080a-clockgen", }, + { .compatible = "fsl,lx2160a-clockgen", }, { .compatible = "fsl,p4080-clockgen", }, { .compatible = "fsl,qoriq-clockgen-1.0", }, { .compatible = "fsl,qoriq-clockgen-2.0", }, -- 2.7.4
[PATCH v2 2/5] soc/fsl/guts: Add compatible string for LX2160A
Adding compatible string "lx2160a-dcfg" to initialize guts driver for lx2160 Signed-off-by: Vabhav Sharma --- drivers/soc/fsl/guts.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/soc/fsl/guts.c b/drivers/soc/fsl/guts.c index 302e0c8..5e1e633 100644 --- a/drivers/soc/fsl/guts.c +++ b/drivers/soc/fsl/guts.c @@ -222,6 +222,7 @@ static const struct of_device_id fsl_guts_of_match[] = { { .compatible = "fsl,ls1088a-dcfg", }, { .compatible = "fsl,ls1012a-dcfg", }, { .compatible = "fsl,ls1046a-dcfg", }, + { .compatible = "fsl,lx2160a-dcfg", }, {} }; MODULE_DEVICE_TABLE(of, fsl_guts_of_match); -- 2.7.4
[PATCH v2 1/5] dt-bindings: arm64: add compatible for LX2160A
Add compatible for LX2160A SoC,QDS and RDB board Signed-off-by: Vabhav Sharma --- Documentation/devicetree/bindings/arm/fsl.txt | 12 1 file changed, 12 insertions(+) diff --git a/Documentation/devicetree/bindings/arm/fsl.txt b/Documentation/devicetree/bindings/arm/fsl.txt index cdb9dd7..76256bd 100644 --- a/Documentation/devicetree/bindings/arm/fsl.txt +++ b/Documentation/devicetree/bindings/arm/fsl.txt @@ -218,3 +218,15 @@ Required root node properties: LS2088A ARMv8 based RDB Board Required root node properties: - compatible = "fsl,ls2088a-rdb", "fsl,ls2088a"; + +LX2160A SoC +Required root node properties: +- compatible = "fsl,lx2160a"; + +LX2160A ARMv8 based QDS Board +Required root node properties: +- compatible = "fsl,lx2160a-qds", "fsl,lx2160a"; + +LX2160A ARMv8 based RDB Board +Required root node properties: +- compatible = "fsl,lx2160a-rdb", "fsl,lx2160a"; -- 2.7.4
[PATCH v2 0/5] arm64: dts: NXP: add basic dts file for LX2160A SoC
Changes for v2: - Modified cmux_to_group array to include -1 terminator - Revert NUM_CMUX to original value 8 from 16 - Remove “As LX2160A is 16 core, so modified value for NUM_CMUX” in patch "[PATCH 3/5] drivers: clk-qoriq: Add clockgen support for lx2160a" description - Populated cache properties for L1 and L2 cache in lx2160a device-tree. - Removed reboot node from lx2160a device-tree as PSCI is implemented. - Removed incorrect comment for timer node interrupt property in lx2160a device-tree. - Modified pmu node compatible property from "arm,armv8-pmuv3" to "arm,cortex-a72-pmu" in lx2160a device-tree - Non-standard aliases removed in lx2160a rdb board device-tree - Updated i2c child nodes to generic name in lx2160a rdb device-tree. Changes for v1: - Add compatible string for LX2160A clockgen support - Add compatible string to initialize LX2160A guts driver - Add compatible string for LX2160A support in dt-bindings - Add dts file to enable support for LX2160A SoC and LX2160A RDB (Reference design board) Vabhav Sharma (4): dt-bindings: arm64: add compatible for LX2160A soc/fsl/guts: Add compatible string for LX2160A arm64: dts: add QorIQ LX2160A SoC support arm64: dts: add LX2160ARDB board support Yogesh Gaur (1): drivers: clk-qoriq: Add clockgen support for lx2160a Documentation/devicetree/bindings/arm/fsl.txt | 12 + arch/arm64/boot/dts/freescale/Makefile| 1 + arch/arm64/boot/dts/freescale/fsl-lx2160a-rdb.dts | 88 +++ arch/arm64/boot/dts/freescale/fsl-lx2160a.dtsi| 693 ++ drivers/clk/clk-qoriq.c | 14 +- drivers/cpufreq/qoriq-cpufreq.c | 1 + drivers/soc/fsl/guts.c| 1 + 7 files changed, 809 insertions(+), 1 deletion(-) create mode 100644 arch/arm64/boot/dts/freescale/fsl-lx2160a-rdb.dts create mode 100644 arch/arm64/boot/dts/freescale/fsl-lx2160a.dtsi -- 2.7.4