Re: [RFC Part1 PATCH v3 10/17] resource: Provide resource struct in resource walk callback
On Mon, Jul 24, 2017 at 12:07 PM, Brijesh Singhwrote: > From: Tom Lendacky > > In prep for a new function that will need additional resource information > during the resource walk, update the resource walk callback to pass the > resource structure. Since the current callback start and end arguments > are pulled from the resource structure, the callback functions can obtain > them from the resource structure directly. > > Signed-off-by: Tom Lendacky > Signed-off-by: Brijesh Singh This is a nice clean up even without the refactoring need. :) Reviewed-by: Kees Cook Thanks! -Kees -- Kees Cook Pixel Security
Re: [Jfs-discussion] [PATCH] fs: convert a pile of fsync routines to errseq_t based reporting
On 07/28/2017 09:23 AM, Jeff Layton wrote: > From: Jeff Layton> > This patch converts most of the in-kernel filesystems that do writeback > out of the pagecache to report errors using the errseq_t-based > infrastructure that was recently added. This allows them to report > errors once for each open file description. > > Most filesystems have a fairly straightforward fsync operation. They > call filemap_write_and_wait_range to write back all of the data and > wait on it, and then (sometimes) sync out the metadata. > > For those filesystems this is a straightforward conversion from calling > filemap_write_and_wait_range in their fsync operation to calling > file_write_and_wait_range. > > Signed-off-by: Jeff Layton Acked-by: Dave Kleikamp (for jfs) > --- > arch/powerpc/platforms/cell/spufs/file.c | 2 +- > drivers/staging/lustre/lustre/llite/file.c | 2 +- > drivers/video/fbdev/core/fb_defio.c| 2 +- > fs/9p/vfs_file.c | 4 ++-- > fs/affs/file.c | 2 +- > fs/afs/write.c | 2 +- > fs/cifs/file.c | 4 ++-- > fs/exofs/file.c| 2 +- > fs/f2fs/file.c | 2 +- > fs/hfs/inode.c | 2 +- > fs/hfsplus/inode.c | 2 +- > fs/hostfs/hostfs_kern.c| 2 +- > fs/hpfs/file.c | 2 +- > fs/jffs2/file.c| 2 +- > fs/jfs/file.c | 2 +- > fs/ncpfs/file.c| 2 +- > fs/ntfs/dir.c | 2 +- > fs/ntfs/file.c | 2 +- > fs/ocfs2/file.c| 2 +- > fs/reiserfs/dir.c | 2 +- > fs/reiserfs/file.c | 2 +- > fs/ubifs/file.c| 2 +- > 22 files changed, 24 insertions(+), 24 deletions(-) > > Rolling up all of these conversions into a single patch, as Christoph > Hellwig suggested. Most of these are not tested, but the conversion > here is fairly straightforward. > > Any maintainers who object, please let me know and I'll yank that > part out of this patch. > > diff --git a/arch/powerpc/platforms/cell/spufs/file.c > b/arch/powerpc/platforms/cell/spufs/file.c > index ae2f740a82f1..5ffcdeb1eb17 100644 > --- a/arch/powerpc/platforms/cell/spufs/file.c > +++ b/arch/powerpc/platforms/cell/spufs/file.c > @@ -1749,7 +1749,7 @@ static int spufs_mfc_flush(struct file *file, > fl_owner_t id) > static int spufs_mfc_fsync(struct file *file, loff_t start, loff_t end, int > datasync) > { > struct inode *inode = file_inode(file); > - int err = filemap_write_and_wait_range(inode->i_mapping, start, end); > + int err = file_write_and_wait_range(file, start, end); > if (!err) { > inode_lock(inode); > err = spufs_mfc_flush(file, NULL); > diff --git a/drivers/staging/lustre/lustre/llite/file.c > b/drivers/staging/lustre/lustre/llite/file.c > index ab1c85c1ed38..f7d07735ac66 100644 > --- a/drivers/staging/lustre/lustre/llite/file.c > +++ b/drivers/staging/lustre/lustre/llite/file.c > @@ -2364,7 +2364,7 @@ int ll_fsync(struct file *file, loff_t start, loff_t > end, int datasync) > PFID(ll_inode2fid(inode)), inode); > ll_stats_ops_tally(ll_i2sbi(inode), LPROC_LL_FSYNC, 1); > > - rc = filemap_write_and_wait_range(inode->i_mapping, start, end); > + rc = file_write_and_wait_range(file, start, end); > inode_lock(inode); > > /* catch async errors that were recorded back when async writeback > diff --git a/drivers/video/fbdev/core/fb_defio.c > b/drivers/video/fbdev/core/fb_defio.c > index 37f69c061210..487d5e336e1b 100644 > --- a/drivers/video/fbdev/core/fb_defio.c > +++ b/drivers/video/fbdev/core/fb_defio.c > @@ -69,7 +69,7 @@ int fb_deferred_io_fsync(struct file *file, loff_t start, > loff_t end, int datasy > { > struct fb_info *info = file->private_data; > struct inode *inode = file_inode(file); > - int err = filemap_write_and_wait_range(inode->i_mapping, start, end); > + int err = file_write_and_wait_range(file, start, end); > if (err) > return err; > > diff --git a/fs/9p/vfs_file.c b/fs/9p/vfs_file.c > index 3de3b4a89d89..4802d75b3cf7 100644 > --- a/fs/9p/vfs_file.c > +++ b/fs/9p/vfs_file.c > @@ -445,7 +445,7 @@ static int v9fs_file_fsync(struct file *filp, loff_t > start, loff_t end, > struct p9_wstat wstat; > int retval; > > - retval = filemap_write_and_wait_range(inode->i_mapping, start, end); > + retval = file_write_and_wait_range(filp, start, end); > if (retval) > return retval; > > @@ -468,7 +468,7 @@ int v9fs_file_fsync_dotl(struct file *filp, loff_t start, > loff_t end, > struct inode
[RFC PATCH 3/3] powerpc/mm: Separate LMB information from device tree format
With the upcoming introduction of a new device tree property format for memory (ibm,dynamic-memory-v2), we should separate the data for the LMBs on a system from the device tree format used to represent them. Witout doing this we face the task of having to update each piece of the kernel that wants LMB information to know how to parse all possible device tree formats. This patch solves this by creating an array LMB information at boot time. This array will hold all relevant information poresented in the device tree for each LMB; base address, drc index, associativity array index, and flags. Any need to get LMB information can now use the array to get LMB information directly without having to determine which version of the device tree format is in use and know how to parse each one. Signed-off-by: Nathan Fontenot--- arch/powerpc/include/asm/lmb.h | 44 ++ arch/powerpc/mm/Makefile |2 arch/powerpc/mm/lmb.c | 146 arch/powerpc/mm/numa.c | 181 +++- 4 files changed, 221 insertions(+), 152 deletions(-) create mode 100644 arch/powerpc/include/asm/lmb.h create mode 100644 arch/powerpc/mm/lmb.c diff --git a/arch/powerpc/include/asm/lmb.h b/arch/powerpc/include/asm/lmb.h new file mode 100644 index 000..7ff2fa6 --- /dev/null +++ b/arch/powerpc/include/asm/lmb.h @@ -0,0 +1,44 @@ +/* + * lmb.h: Power specific logical memory block representation + * + * ** Add (C) ** + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ + +#ifndef _ASM_POWERPC_LMB_H +#define _ASM_POWERPC_LMB_H + +extern struct lmb_data *lmb_array; +extern int n_mem_addr_cells, n_mem_size_cells; + +struct lmb { + u64 base_address; + u32 drc_index; + u32 aa_index; + u32 flags; +}; + +struct lmb_data { + struct lmb *lmbs; + int num_lmbs; + u32 lmb_size; +}; + +extern struct lmb_data *lmb_array; + +#define for_each_lmb(_lmb) \ + for (_lmb = _array->lmbs[0];\ +_lmb != _array->lmbs[lmb_array->num_lmbs]; \ +_lmb++) + +extern int lmb_init(void); +extern u32 lmb_get_lmb_size(void); +extern u64 lmb_get_max_memory(void); +extern unsigned long read_n_cells(int n, const __be32 **buf); +extern void get_n_mem_cells(int *n_addr_cells, int *n_size_cells); + +#endif diff --git a/arch/powerpc/mm/Makefile b/arch/powerpc/mm/Makefile index 7414034..56c2591 100644 --- a/arch/powerpc/mm/Makefile +++ b/arch/powerpc/mm/Makefile @@ -28,7 +28,7 @@ obj-$(CONFIG_40x) += 40x_mmu.o obj-$(CONFIG_44x) += 44x_mmu.o obj-$(CONFIG_PPC_8xx) += 8xx_mmu.o obj-$(CONFIG_PPC_FSL_BOOK3E) += fsl_booke_mmu.o -obj-$(CONFIG_NEED_MULTIPLE_NODES) += numa.o +obj-$(CONFIG_NEED_MULTIPLE_NODES) += numa.o lmb.o obj-$(CONFIG_PPC_SPLPAR) += vphn.o obj-$(CONFIG_PPC_MM_SLICES)+= slice.o obj-y += hugetlbpage.o diff --git a/arch/powerpc/mm/lmb.c b/arch/powerpc/mm/lmb.c new file mode 100644 index 000..e12e5be --- /dev/null +++ b/arch/powerpc/mm/lmb.c @@ -0,0 +1,146 @@ +/* + * pSeries LMB support + * + * ** Add (C) + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ + +#define pr_fmt(fmt) "lmb: " fmt + +#include +#include +#include +#include +#include +#include + +struct lmb_data __lmb_data; +struct lmb_data *lmb_array = &__lmb_data; +int n_mem_addr_cells, n_mem_size_cells; + +unsigned long read_n_cells(int n, const __be32 **buf) +{ + unsigned long result = 0; + + while (n--) { + result = (result << 32) | of_read_number(*buf, 1); + (*buf)++; + } + return result; +} + +void __init get_n_mem_cells(int *n_addr_cells, int *n_size_cells) +{ + struct device_node *memory = NULL; + + memory = of_find_node_by_type(memory, "memory"); + if (!memory) + panic("numa.c: No memory nodes found!"); + + *n_addr_cells = of_n_addr_cells(memory); + *n_size_cells = of_n_size_cells(memory); + of_node_put(memory); +} + +u32 lmb_get_lmb_size(void) +{ + return lmb_array->lmb_size; +} + +u64 lmb_get_max_memory(void) +{ + u32 last_index = lmb_array->num_lmbs - 1; + + return lmb_array->lmbs[last_index].base_address + lmb_array->lmb_size; +} + +/* + * Retrieve and validate the ibm,dynamic-memory property of the device tree. + * + * The layout of the ibm,dynamic-memory property is a number N of memblock
[RFC PATCH 2/3] powerpc/numa: Get device node whenn retreiving usm memory
When we move to using the kernel lmb structs instead of accessing the device tree directly for LMB information we will no longer have a pointer to the device node for memory to pass to of_get_usable_memory(). This patch updates of_get_usable_memory() to no longer take a device node pointer and does the lookup of the memory device node itself. Signed-off-by: Nathan Fontenot--- arch/powerpc/mm/numa.c | 12 ++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c index 9c21953..24d9299 100644 --- a/arch/powerpc/mm/numa.c +++ b/arch/powerpc/mm/numa.c @@ -184,11 +184,19 @@ static const __be32 *of_get_associativity(struct device_node *dev) * it exists (the property exists only in kexec/kdump kernels, * added by kexec-tools) */ -static const __be32 *of_get_usable_memory(struct device_node *memory) +static const __be32 *of_get_usable_memory(void) { + struct device_node *memory; const __be32 *prop; u32 len; + + memory = of_find_node_by_path("/ibm,dynamic-reconfiguration-memory"); + if (!memory) + return NULL; + prop = of_get_property(memory, "linux,drconf-usable-memory", ); + of_node_put(memory); + if (!prop || len < sizeof(unsigned int)) return NULL; return prop; @@ -674,7 +682,7 @@ static void __init parse_drconf_memory(struct device_node *memory) return; /* check if this is a kexec/kdump kernel */ - usm = of_get_usable_memory(memory); + usm = of_get_usable_memory(); if (usm != NULL) is_kexec_kdump = 1;
[RFC PATCH 1/3] powerpc/numa: Get device node whenn retreiving associativity arrays
When we move to using the kernel lmb structs instead of accessing the device tree directly for LMB information we will no longer have a pointer to the device node for memory to pass to of_get_assoc_arrays(). This patch updates of_get_assoc_arrays() to no longer take a device node pointer and does the lookup of the memory device node itself. Signed-off-by: Nathan Fontenot--- arch/powerpc/mm/numa.c | 18 +- 1 file changed, 13 insertions(+), 5 deletions(-) diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c index 371792e..9c21953 100644 --- a/arch/powerpc/mm/numa.c +++ b/arch/powerpc/mm/numa.c @@ -466,19 +466,27 @@ struct assoc_arrays { * indicating the size of each associativity array, followed by a list * of N associativity arrays. */ -static int of_get_assoc_arrays(struct device_node *memory, - struct assoc_arrays *aa) +static int of_get_assoc_arrays(struct assoc_arrays *aa) { + struct device_node *memory; const __be32 *prop; u32 len; + memory = of_find_node_by_path("/ibm,dynamic-reconfiguration-memory"); + if (!memory) + return -1; + prop = of_get_property(memory, "ibm,associativity-lookup-arrays", ); - if (!prop || len < 2 * sizeof(unsigned int)) + if (!prop || len < 2 * sizeof(unsigned int)) { + of_node_put(memory); return -1; + } aa->n_arrays = of_read_number(prop++, 1); aa->array_sz = of_read_number(prop++, 1); + of_node_put(memory); + /* Now that we know the number of arrays and size of each array, * revalidate the size of the property read in. */ @@ -661,7 +669,7 @@ static void __init parse_drconf_memory(struct device_node *memory) if (!lmb_size) return; - rc = of_get_assoc_arrays(memory, ); + rc = of_get_assoc_arrays(); if (rc) return; @@ -996,7 +1004,7 @@ static int hot_add_drconf_scn_to_nid(struct device_node *memory, if (!lmb_size) return -1; - rc = of_get_assoc_arrays(memory, ); + rc = of_get_assoc_arrays(); if (rc) return -1;
[RFC PATCH 0/3] Separate LMB data from device tree format
With the upcoming introduction of a new device tree property format for memory (ibm,dynamic-memory-v2), we should separate LMB data from the device tree format used to represent them. Doing this allows any consumer of LMB information, currently the mm/numa and pseries/hotplug-memory code, to access it directly without having to worry about device tree format and how to parse it. This patch set attempts to solve this by creating an array of LMB information at boot time which holds all relevant data presented in the device tree for each LMB; base address, drc index, associativity array index, and flags. The first two patches are small updates to two routines to have them look up the memory device node instead of having it passed to them. This is needed since the new code will not need to look at the device tree when getting LMB information. The third patch introduces the new LMB data array in lmb.h and the set of routines needed to initialize and access the array. The data is intialized from parse_numa_properties() and the routines in numa.c are updated to use the new LMB array. A few notes: This code has only been boot-tested. I would like to get some feedback on this approach before venturing too far down this design path. I considered intializing the lmb array in prom.c to avoid having to write a new routine there to parse the -v2 property. My concern is allocating the array for this information that early in boot, the number of LMBs can get very large on systems with 16, 32 TB. The code for memory DLPAR (pseries/hotplug-memory.c) still needs to be updated to use the new LMB array. Any thoughts, feedback, comments (good and bad) would be appreciated. Thanks, -Nathan --- Nathan Fontenot (3): powerpc/numa: Get device node whenn retreiving associativity arrays powerpc/numa: Get device node whenn retreiving usm memory powerpc/mm: Separate LMB information from device tree format arch/powerpc/include/asm/lmb.h | 44 arch/powerpc/mm/Makefile |2 arch/powerpc/mm/lmb.c | 146 arch/powerpc/mm/numa.c | 211 ++-- 4 files changed, 244 insertions(+), 159 deletions(-) create mode 100644 arch/powerpc/include/asm/lmb.h create mode 100644 arch/powerpc/mm/lmb.c
Re: [PATCH v3] powerpc/powernv: Enable PCI peer-to-peer
Michael, What do we need on this one before we can pull into your -next branch? Thanks, Brian -- Brian King Power Linux I/O IBM Linux Technology Center
[RFC PATCH] powerpc: Disabling MEMORY_HOTPLUG_DEFAULT_ONLINE option for PPC64 arch
Commit 943db62c316c ("powerpc/pseries: Revert 'Auto-online hotplugged memory'") reverted the auto-online feature for pseries due to problems with LMB removals not updating the device struct properly. Among other things, this commit made the following change in arch/powerpc/configs/pseries_defconfig: @@ -58,7 +58,6 @@ CONFIG_KEXEC_FILE=y CONFIG_IRQ_ALL_CPUS=y CONFIG_MEMORY_HOTPLUG=y CONFIG_MEMORY_HOTREMOVE=y -CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE=y CONFIG_KSM=y The intent was to disable the option in the defconfig of pseries, since after that the code doesn't have this support anymore. However, this change alone isn't enough to prevent situations such as [1], where distros can enable the option unaware of the consequences of doing it (e.g. breaking LMB hotplug altogether). Instead of relying on all distros knowing that pseries can't handle CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE=y after 943db62c316c, this patch changes mm/Kconfig to make the MEMORY_HOTPLUG_DEFAULT_ONLINE config unavailable for the PPC64 arch. [1] https://bugzilla.redhat.com/show_bug.cgi?id=1476380 Fixes: 943db62c316c ("powerpc/pseries: Revert 'Auto-online hotplugged memory'") Signed-off-by: Daniel Henrique Barboza--- mm/Kconfig | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/Kconfig b/mm/Kconfig index 48b1af4..a342c77 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -169,7 +169,7 @@ config MEMORY_HOTPLUG_SPARSE config MEMORY_HOTPLUG_DEFAULT_ONLINE bool "Online the newly added memory blocks by default" default n -depends on MEMORY_HOTPLUG +depends on MEMORY_HOTPLUG && !PPC64 help This option sets the default policy setting for memory hotplug onlining policy (/sys/devices/system/memory/auto_online_blocks) which -- 2.9.4
Re: [RFC v6 13/62] powerpc: track allocation status of all pkeys
Ram Paiwrites: > static inline int mm_pkey_free(struct mm_struct *mm, int pkey) > { > - return -EINVAL; > + if (!pkey_inited) > + return -1; Sorry, I missed this earlier but the pkey_free syscall will pass this value to userspace so it needs to be an errno as well (-EINVAL?). > + > + if (!mm_pkey_is_allocated(mm, pkey)) > + return -EINVAL; > + > + mm_set_pkey_free(mm, pkey); > + > + return 0; > } -- Thiago Jung Bauermann IBM Linux Technology Center
[PATCH 3/3] powerpc/xmon: Disable tracing on xmon by default
Currently tracing is enabled from inside xmon, which may cause some noise into the tracing buffer, and makes it harder to find what, in the tracing buffer, are kernel non-xmon functions and what is xmon 'noise' (as printk()s and terminal functions tracing). This patch simple disables it by default, showing a better trace output of the failing functions just before it gets into xmon. Signed-off-by: Breno Leitao--- arch/powerpc/xmon/xmon.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c index 19276d2f2f25..b614cc3a3a65 100644 --- a/arch/powerpc/xmon/xmon.c +++ b/arch/powerpc/xmon/xmon.c @@ -89,7 +89,7 @@ static unsigned long nidump = 16; static unsigned long ncsum = 4096; static int termch; static char tmpstr[128]; -static char tracing_enabled = 1; +static char tracing_enabled = 0; static long bus_error_jmp[JMP_BUF_LEN]; static int catch_memory_errors; @@ -463,6 +463,7 @@ static int xmon_core(struct pt_regs *regs, int fromipi) local_irq_save(flags); hard_irq_disable(); + tracing_off(); bp = in_breakpoint_table(regs->nip, ); if (bp != NULL) { -- 2.13.2
[PATCH 2/3] powerpc/xmon: Disable and enable tracing command
If tracing is enabled and you get into xmon, the tracing buffer continues to be updated, causing possible loss of data due to buffer overflow and unnecessary tracing information coming from xmon functions. This patch adds a new option that allows the tracing to be disabled and re-enabled from inside xmon. Signed-off-by: Breno Leitao--- arch/powerpc/xmon/xmon.c | 16 +++- 1 file changed, 15 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c index 0cbd910193fa..19276d2f2f25 100644 --- a/arch/powerpc/xmon/xmon.c +++ b/arch/powerpc/xmon/xmon.c @@ -89,6 +89,7 @@ static unsigned long nidump = 16; static unsigned long ncsum = 4096; static int termch; static char tmpstr[128]; +static char tracing_enabled = 1; static long bus_error_jmp[JMP_BUF_LEN]; static int catch_memory_errors; @@ -268,6 +269,7 @@ Commands:\n\ Sr # read SPR #\n\ Sw #v write v to SPR #\n\ tprint backtrace\n\ + vtrace enable/disable\n\ xexit monitor and recover\n\ Xexit monitor and don't recover\n" #if defined(CONFIG_PPC64) && !defined(CONFIG_PPC_BOOK3E) @@ -983,6 +985,17 @@ cmds(struct pt_regs *excp) case 'x': case 'X': return cmd; + case 'v': + if (tracing_is_on()) { + printk("Disabling tracing\n"); + tracing_enabled = 0; + tracing_off(); + } else { + printk("Enabling tracing\n"); + tracing_enabled = 1; + tracing_on(); + } + break; case EOF: printf(" \n"); mdelay(2000); @@ -2353,7 +2366,8 @@ static void dump_tracing(void) else ftrace_dump(DUMP_ALL); - tracing_on(); + if (tracing_enabled) + tracing_on(); } static void dump_all_pacas(void) -- 2.13.2
[PATCH 1/3] powerpc/xmon: Dump ftrace buffers for the current CPU
Current xmon 'dt' command dumps the tracing buffer for all the CPUs, which makes it possibly hard to read the logs due to the fact that most of powerpc machines currently have many CPUs. Other than that, the CPU lines are interleaved in the ftrace log. This new option just dumps the ftrace buffer for the current CPU. Signed-off-by: Breno Leitao--- arch/powerpc/xmon/xmon.c | 22 +++--- 1 file changed, 19 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c index 08e367e3e8c3..0cbd910193fa 100644 --- a/arch/powerpc/xmon/xmon.c +++ b/arch/powerpc/xmon/xmon.c @@ -234,6 +234,7 @@ Commands:\n\ "\ dr dump stream of raw bytes\n\ dt dump the tracing buffers (uses printk)\n\ + dtc dump the tracing buffers for current CPU (uses printk)\n\ " #ifdef CONFIG_PPC_POWERNV " dx# dump xive on CPU #\n\ @@ -2342,6 +2343,19 @@ static void dump_one_paca(int cpu) sync(); } +static void dump_tracing(void) +{ + int c; + + c = inchar(); + if (c == 'c') + ftrace_dump(DUMP_ORIG); + else + ftrace_dump(DUMP_ALL); + + tracing_on(); +} + static void dump_all_pacas(void) { int cpu; @@ -2507,6 +2521,11 @@ dump(void) } #endif + if (c == 't') { + dump_tracing(); + return; + } + if (c == '\n') termch = c; @@ -2525,9 +2544,6 @@ dump(void) dump_log_buf(); } else if (c == 'o') { dump_opal_msglog(); - } else if (c == 't') { - ftrace_dump(DUMP_ALL); - tracing_on(); } else if (c == 'r') { scanhex(); if (ndump == 0) -- 2.13.2
Re: [RFC v6 21/62] powerpc: introduce execute-only pkey
Ram Paiwrites: > On Fri, Jul 28, 2017 at 07:17:13PM -0300, Thiago Jung Bauermann wrote: >> >> Ram Pai writes: >> > --- a/arch/powerpc/mm/pkeys.c >> > +++ b/arch/powerpc/mm/pkeys.c >> > @@ -97,3 +97,60 @@ int __arch_set_user_pkey_access(struct task_struct >> > *tsk, int pkey, >> >init_iamr(pkey, new_iamr_bits); >> >return 0; >> > } >> > + >> > +static inline bool pkey_allows_readwrite(int pkey) >> > +{ >> > + int pkey_shift = pkeyshift(pkey); >> > + >> > + if (!(read_uamor() & (0x3UL << pkey_shift))) >> > + return true; >> > + >> > + return !(read_amr() & ((AMR_RD_BIT|AMR_WR_BIT) << pkey_shift)); >> > +} >> > + >> > +int __execute_only_pkey(struct mm_struct *mm) >> > +{ >> > + bool need_to_set_mm_pkey = false; >> > + int execute_only_pkey = mm->context.execute_only_pkey; >> > + int ret; >> > + >> > + /* Do we need to assign a pkey for mm's execute-only maps? */ >> > + if (execute_only_pkey == -1) { >> > + /* Go allocate one to use, which might fail */ >> > + execute_only_pkey = mm_pkey_alloc(mm); >> > + if (execute_only_pkey < 0) >> > + return -1; >> > + need_to_set_mm_pkey = true; >> > + } >> > + >> > + /* >> > + * We do not want to go through the relatively costly >> > + * dance to set AMR if we do not need to. Check it >> > + * first and assume that if the execute-only pkey is >> > + * readwrite-disabled than we do not have to set it >> > + * ourselves. >> > + */ >> > + if (!need_to_set_mm_pkey && >> > + !pkey_allows_readwrite(execute_only_pkey)) > ^ > Here uamor and amr is read once each. You are right. What confused me was that the call to mm_pkey_alloc above also reads uamor and amr (and also iamr, and writes to all of those) but if that function is called, then need_to_set_mm_pkey is true and pkey_allows_readwrite won't be called. >> > + return execute_only_pkey; >> > + >> > + /* >> > + * Set up AMR so that it denies access for everything >> > + * other than execution. >> > + */ >> > + ret = __arch_set_user_pkey_access(current, execute_only_pkey, >> > + (PKEY_DISABLE_ACCESS | PKEY_DISABLE_WRITE)); > ^^^ > here amr and iamr are written once each if the > the function returns successfully. __arch_set_user_pkey_access also reads uamor for the second time in its call to is_pkey_enabled, and reads amr for the second time as well in its calls to init_amr. The first reads are in either pkey_allows_readwrite or pkey_status_change (called from __arch_activate_pkey). If need_to_set_mm_pkey is true, then the iamr read in init_iamr is the 2nd one during __execute_only_pkey's execution. In this case the writes to amr and iamr will be the 2nd ones as well. The first reads and writes are in pkey_status_change. >> > + /* >> > + * If the AMR-set operation failed somehow, just return >> > + * 0 and effectively disable execute-only support. >> > + */ >> > + if (ret) { >> > + mm_set_pkey_free(mm, execute_only_pkey); > ^^^ > here only if __arch_set_user_pkey_access() fails > amr and iamr and uamor will be written once each. I assume the error case isn't perfomance sensitive and didn't account for mm_set_pkey_free in my analysis. >> > + return -1; >> > + } >> > + >> > + /* We got one, store it and use it from here on out */ >> > + if (need_to_set_mm_pkey) >> > + mm->context.execute_only_pkey = execute_only_pkey; >> > + return execute_only_pkey; >> > +} >> >> If you follow the code flow in __execute_only_pkey, the AMR and UAMOR >> are read 3 times in total, and AMR is written twice. IAMR is read and >> written twice. Since they are SPRs and access to them is slow (or isn't >> it?), is it worth it to read them once in __execute_only_pkey and pass >> down their values to the callees, and then write them once at the end of >> the function? > > If my calculations are right: > uamor may be read once and may be written once. > amr may be read once and is written once. > iamr is written once. > So not that bad, i think. If I'm following the code correctly: if need_to_set_mm_pkey = true: uamor is read twice and written once. amr is read twice and written twice. iamr is read twice and written twice. if need_to_set_mm_pkey = false: uamor is read twice. amr is read once or twice (depending on the value of uamor) and written once. iamr is read once and written once. -- Thiago Jung Bauermann IBM Linux Technology Center
Re: [RESEND PATCH v5 00/16] eeprom: at24: Add OF device ID table
Hello Wolfram, On Mon, Jul 31, 2017 at 5:30 PM, Wolfram Sangwrote: > >> Patches can be applied independently since the DTS changes without driver >> changes are no-op and the OF table won't be used without the DTS changes. > > But there is a dependency, no? If I apply the driver patch, > non-converted device trees will not find their eeproms anymore. So, I I don't think that's correct. If you apply this patch before the DTS changes, the driver will still match using the I2C device ID table like it has been doing it until today. IOW, this is what will happen: 1- an OF device is registered with the wrong compatible (not found in the OF table) 2- the I2C core strips the vendor part and fills the struct i2c_client .name with the device part. 3- i2c_device_match() will be called since a new device has been registered 4- i2c_of_match_device() will fail because there's no OF entry that matches the device compatible 5- the I2C core fallbacks to i2c_match_id() and matches using the I2C device ID table. So no noticeable difference AFAICT in that case. Best regards, Javier
Re: [RESEND PATCH v5 00/16] eeprom: at24: Add OF device ID table
> Patches can be applied independently since the DTS changes without driver > changes are no-op and the OF table won't be used without the DTS changes. But there is a dependency, no? If I apply the driver patch, non-converted device trees will not find their eeproms anymore. So, I need to wait until all DTS patches are upstream, right? I can pick patch 1, though. We can already document it. signature.asc Description: PGP signature
Re: RCU lockup issues when CONFIG_SOFTLOCKUP_DETECTOR=n - any one else seeing this?
On Mon, 31 Jul 2017 08:04:11 -0700 "Paul E. McKenney"wrote: > On Mon, Jul 31, 2017 at 12:08:47PM +0100, Jonathan Cameron wrote: > > On Fri, 28 Jul 2017 12:03:50 -0700 > > "Paul E. McKenney" wrote: > > > > > On Fri, Jul 28, 2017 at 06:27:05PM +0100, Jonathan Cameron wrote: > > > > On Fri, 28 Jul 2017 09:55:29 -0700 > > > > "Paul E. McKenney" wrote: > > > > > > > > > On Fri, Jul 28, 2017 at 02:24:03PM +0100, Jonathan Cameron wrote: > > > > > > On Fri, 28 Jul 2017 08:44:11 +0100 > > > > > > Jonathan Cameron wrote: > > > > > > > > > > [ . . . ] > > > > > > > > > > > Ok. Some info. I disabled a few driver (usb and SAS) in the > > > > > > interest of having > > > > > > fewer timer events. Issue became much easier to trigger (on some > > > > > > runs before > > > > > > I could get tracing up and running) > > > > > >e > > > > > > So logs are large enough that pastebin doesn't like them - please > > > > > > shoet if > > > > > >>e another timer period is of interest. > > > > > > > > > > > > https://pastebin.com/iUZDfQGM for the timer trace. > > > > > > https://pastebin.com/3w1F7amH for dmesg. > > > > > > > > > > > > The relevant timeout on the RCU stall detector was 8 seconds. > > > > > > Event is > > > > > > detected around 835. > > > > > > > > > > > > It's a lot of logs, so I haven't identified a smoking gun yet but > > > > > > there > > > > > > may well be one in there. > > > > > > > > > > The dmesg says: > > > > > > > > > > rcu_preempt kthread starved for 2508 jiffies! g112 c111 f0x0 > > > > > RCU_GP_WAIT_FQS(3) ->state=0x1 > > > > > > > > > > So I look for "rcu_preempt" timer events and find these: > > > > > > > > > > rcu_preempt-9 [019] 827.579114: timer_init: > > > > > timer=8017d5fc7da0 > > > > > rcu_preempt-9 [019] d..1 827.579115: timer_start: > > > > > timer=8017d5fc7da0 function=process_timeout > > > > > > > > > > Next look for "8017d5fc7da0" and I don't find anything else. > > > > It does show up off the bottom of what would fit in pastebin... > > > > > > > > rcu_preempt-9 [001] d..1 837.681077: timer_cancel: > > > > timer=8017d5fc7da0 > > > > rcu_preempt-9 [001] 837.681086: timer_init: > > > > timer=8017d5fc7da0 > > > > rcu_preempt-9 [001] d..1 837.681087: timer_start: > > > > timer=8017d5fc7da0 function=process_timeout expires=4295101298 > > > > [timeout=1] cpu=1 idx=0 flags= > > > > > > Odd. I would expect an expiration... And ten seconds is way longer > > > than the requested one jiffy! > > > > > > > > The timeout was one jiffy, and more than a second later, no > > > > > expiration. > > > > > Is it possible that this event was lost? I am not seeing any sign of > > > > > this is the trace. > > > > > > > > > > I don't see any sign of CPU hotplug (and I test with lots of that in > > > > > any case). > > > > > > > > > > The last time we saw something like this it was a timer HW/driver > > > > > problem, > > > > > but it is a bit hard to imagine such a problem affecting both ARM64 > > > > > and SPARC. ;-) > > > > Could be different issues, both of which were hidden by that lockup > > > > detector. > > > > > > > > There is an errata work around for the timers on this particular board. > > > > I'm only vaguely aware of it, so may be unconnected. > > > > > > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/clocksource/arm_arch_timer.c?h=v4.13-rc2=bb42ca47401010fc02901b5e8f79e40a26f208cb > > > > > > > > Seems unlikely though! + we've not yet seen it on the other chips that > > > > errata effects (not that that means much). > > > > > > If you can reproduce quickly, might be worth trying anyway... > > > > > > Thanx, Paul > > Errata fix is running already and was for all those tests. > > I was afraid of that... ;-) It's a pretty rare errata it seems. Not actually managed to catch one yet. > > > I'll have a dig into the timers today and see where I get to. > > Look forward to seeing what you find! Nothing obvious turning up other than we don't seem to have issue when we aren't running hrtimers. On a plus side I just got a report that it is effecting our d03 boards which is good on the basis I couldn't tell what the difference could be wrt to this issue! It indeed looks like we are consistently missing a timer before the rcu splat occurs. J > > Thanx, Paul > > > Jonathan > > > > > > > Jonathan > > > > > > > > > > > > > > Thomas, any debugging suggestions? > > > > > > > > > > Thanx, Paul > > > > > > > > > > > > > > >
Re: [PATCH] i2c: Convert to using %pOF instead of full_name
On Tue, Jul 18, 2017 at 04:43:06PM -0500, Rob Herring wrote: > Now that we have a custom printf format specifier, convert users of > full_name to use %pOF instead. This is preparation to remove storing > of the full path string for each node. > > Signed-off-by: Rob Herring> Cc: Haavard Skinnemoen > Cc: Wolfram Sang > Cc: Benjamin Herrenschmidt > Cc: Paul Mackerras > Cc: Michael Ellerman > Cc: Maxime Ripard > Cc: Chen-Yu Tsai > Cc: Peter Rosin > Cc: linux-...@vger.kernel.org > Cc: linuxppc-dev@lists.ozlabs.org > Cc: linux-arm-ker...@lists.infradead.org Applied to for-next, thanks! signature.asc Description: PGP signature
Re: RCU lockup issues when CONFIG_SOFTLOCKUP_DETECTOR=n - any one else seeing this?
On Mon, Jul 31, 2017 at 12:08:47PM +0100, Jonathan Cameron wrote: > On Fri, 28 Jul 2017 12:03:50 -0700 > "Paul E. McKenney"wrote: > > > On Fri, Jul 28, 2017 at 06:27:05PM +0100, Jonathan Cameron wrote: > > > On Fri, 28 Jul 2017 09:55:29 -0700 > > > "Paul E. McKenney" wrote: > > > > > > > On Fri, Jul 28, 2017 at 02:24:03PM +0100, Jonathan Cameron wrote: > > > > > On Fri, 28 Jul 2017 08:44:11 +0100 > > > > > Jonathan Cameron wrote: > > > > > > > > [ . . . ] > > > > > > > > > Ok. Some info. I disabled a few driver (usb and SAS) in the > > > > > interest of having > > > > > fewer timer events. Issue became much easier to trigger (on some > > > > > runs before > > > > > I could get tracing up and running) > > > > >e > > > > > So logs are large enough that pastebin doesn't like them - please > > > > > shoet if > > > > >>e another timer period is of interest. > > > > > > > > > > https://pastebin.com/iUZDfQGM for the timer trace. > > > > > https://pastebin.com/3w1F7amH for dmesg. > > > > > > > > > > The relevant timeout on the RCU stall detector was 8 seconds. Event > > > > > is > > > > > detected around 835. > > > > > > > > > > It's a lot of logs, so I haven't identified a smoking gun yet but > > > > > there > > > > > may well be one in there. > > > > > > > > The dmesg says: > > > > > > > > rcu_preempt kthread starved for 2508 jiffies! g112 c111 f0x0 > > > > RCU_GP_WAIT_FQS(3) ->state=0x1 > > > > > > > > So I look for "rcu_preempt" timer events and find these: > > > > > > > > rcu_preempt-9 [019] 827.579114: timer_init: > > > > timer=8017d5fc7da0 > > > > rcu_preempt-9 [019] d..1 827.579115: timer_start: > > > > timer=8017d5fc7da0 function=process_timeout > > > > > > > > Next look for "8017d5fc7da0" and I don't find anything else. > > > It does show up off the bottom of what would fit in pastebin... > > > > > > rcu_preempt-9 [001] d..1 837.681077: timer_cancel: > > > timer=8017d5fc7da0 > > > rcu_preempt-9 [001] 837.681086: timer_init: > > > timer=8017d5fc7da0 > > > rcu_preempt-9 [001] d..1 837.681087: timer_start: > > > timer=8017d5fc7da0 function=process_timeout expires=4295101298 > > > [timeout=1] cpu=1 idx=0 flags= > > > > Odd. I would expect an expiration... And ten seconds is way longer > > than the requested one jiffy! > > > > > > The timeout was one jiffy, and more than a second later, no expiration. > > > > Is it possible that this event was lost? I am not seeing any sign of > > > > this is the trace. > > > > > > > > I don't see any sign of CPU hotplug (and I test with lots of that in > > > > any case). > > > > > > > > The last time we saw something like this it was a timer HW/driver > > > > problem, > > > > but it is a bit hard to imagine such a problem affecting both ARM64 > > > > and SPARC. ;-) > > > Could be different issues, both of which were hidden by that lockup > > > detector. > > > > > > There is an errata work around for the timers on this particular board. > > > I'm only vaguely aware of it, so may be unconnected. > > > > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/clocksource/arm_arch_timer.c?h=v4.13-rc2=bb42ca47401010fc02901b5e8f79e40a26f208cb > > > > > > Seems unlikely though! + we've not yet seen it on the other chips that > > > errata effects (not that that means much). > > > > If you can reproduce quickly, might be worth trying anyway... > > > > Thanx, Paul > Errata fix is running already and was for all those tests. I was afraid of that... ;-) > I'll have a dig into the timers today and see where I get to. Look forward to seeing what you find! Thanx, Paul > Jonathan > > > > > Jonathan > > > > > > > > > > > Thomas, any debugging suggestions? > > > > > > > > Thanx, Paul > > > > > > > > > >
RE: [PATCH 1/5] Fix packed and aligned attribute warnings.
From: SZ Lin > Sent: 29 July 2017 08:24 ... > diff --git a/drivers/char/tpm/tpm_ibmvtpm.h b/drivers/char/tpm/tpm_ibmvtpm.h > index 91dfe766d080..9f708ca3dc84 100644 > --- a/drivers/char/tpm/tpm_ibmvtpm.h > +++ b/drivers/char/tpm/tpm_ibmvtpm.h > @@ -25,7 +25,7 @@ struct ibmvtpm_crq { > __be16 len; > __be32 data; > __be64 reserved; > -} __attribute__((packed, aligned(8))); > +} __packed __aligned(8); You can't need __packed and __aligned(8) on that structure. There are no gaps and you are saying it is always aligned. So just remove the pointless attributes. David
Re: [RFC v6 20/62] powerpc: store and restore the pkey state across context switches
Ram Paiwrites: > On Thu, Jul 27, 2017 at 02:32:59PM -0300, Thiago Jung Bauermann wrote: >> Ram Pai writes: >> > diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c >> > index 2ad725e..9429361 100644 >> > --- a/arch/powerpc/kernel/process.c >> > +++ b/arch/powerpc/kernel/process.c >> > @@ -1096,6 +1096,11 @@ static inline void save_sprs(struct thread_struct >> > *t) >> >t->tar = mfspr(SPRN_TAR); >> >} >> > #endif >> > +#ifdef CONFIG_PPC64_MEMORY_PROTECTION_KEYS >> > + t->amr = mfspr(SPRN_AMR); >> > + t->iamr = mfspr(SPRN_IAMR); >> > + t->uamor = mfspr(SPRN_UAMOR); >> > +#endif >> > } >> > >> > static inline void restore_sprs(struct thread_struct *old_thread, >> > @@ -1131,6 +1136,14 @@ static inline void restore_sprs(struct >> > thread_struct *old_thread, >> >mtspr(SPRN_TAR, new_thread->tar); >> >} >> > #endif >> > +#ifdef CONFIG_PPC64_MEMORY_PROTECTION_KEYS >> > + if (old_thread->amr != new_thread->amr) >> > + mtspr(SPRN_AMR, new_thread->amr); >> > + if (old_thread->iamr != new_thread->iamr) >> > + mtspr(SPRN_IAMR, new_thread->iamr); >> > + if (old_thread->uamor != new_thread->uamor) >> > + mtspr(SPRN_UAMOR, new_thread->uamor); >> > +#endif >> > } >> >> Shouldn't the saving and restoring of the SPRs be guarded by a check for >> whether memory protection keys are enabled? What happens when trying to >> access these registers on a CPU which doesn't have them? > > Good point. need to guard it. However; i think, these registers have been > available since power6. The kernel runs on CPUs much older than that. IAMR was added on Power8. And performance is also an issue, so we should only switch them when we need to. cheers
Re: [RFC v6 19/62] powerpc: ability to create execute-disabled pkeys
Ram Paiwrites: > On Thu, Jul 27, 2017 at 11:54:31AM -0300, Thiago Jung Bauermann wrote: >> >> Ram Pai writes: >> >> > --- a/arch/powerpc/include/asm/pkeys.h >> > +++ b/arch/powerpc/include/asm/pkeys.h >> > @@ -2,6 +2,18 @@ >> > #define _ASM_PPC64_PKEYS_H >> > >> > extern bool pkey_inited; >> > +/* override any generic PKEY Permission defines */ >> > +#undef PKEY_DISABLE_ACCESS >> > +#define PKEY_DISABLE_ACCESS0x1 >> > +#undef PKEY_DISABLE_WRITE >> > +#define PKEY_DISABLE_WRITE 0x2 >> > +#undef PKEY_DISABLE_EXECUTE >> > +#define PKEY_DISABLE_EXECUTE 0x4 >> > +#undef PKEY_ACCESS_MASK >> > +#define PKEY_ACCESS_MASK (PKEY_DISABLE_ACCESS |\ >> > + PKEY_DISABLE_WRITE |\ >> > + PKEY_DISABLE_EXECUTE) >> > + >> >> Is it ok to #undef macros from another header? Especially since said >> header is in uapi (include/uapi/asm-generic/mman-common.h). >> >> Also, it's unnecessary to undef the _ACCESS and _WRITE macros since they >> are identical to the original definition. And since these macros are >> originally defined in an uapi header, the powerpc-specific ones should >> be in an uapi header as well, if I understand it correctly. > > The architectural neutral code allows the implementation to define the > macros to its taste. powerpc headers due to legacy reason includes the > include/uapi/asm-generic/mman-common.h header. That header includes the > generic definitions of only PKEY_DISABLE_ACCESS and PKEY_DISABLE_WRITE. > Unfortunately we end up importing them. I dont want to depend on them. > Any changes there could effect us. Example if the generic uapi header > changed PKEY_DISABLE_ACCESS to 0x4, we will have a conflict with > PKEY_DISABLE_EXECUTE. Hence I undef them and define the it my way. Don't do that. The generic header can't change the values, it's an ABI. Doing it this way risks the uapi value diverging from the value used in the powerpc code (due to a change in the powerpc version), which would mean userspace and the kernel wouldn't agree on what the values meant ... which would be exciting. cheers
[PATCH v2 3/3] powerpc/strict_kernel_rwx: Don't depend on !RELOCATABLE
The concerns with extra permissions and overlap have been address, remove the dependency on !RELOCTABLE Signed-off-by: Balbir Singh--- arch/powerpc/Kconfig | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 36f858c..b5b8ba8 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -165,7 +165,7 @@ config PPC select HAVE_ARCH_MMAP_RND_COMPAT_BITS if COMPAT select HAVE_ARCH_SECCOMP_FILTER select HAVE_ARCH_TRACEHOOK - select ARCH_HAS_STRICT_KERNEL_RWX if (PPC_BOOK3S_64 && !RELOCATABLE && !HIBERNATION) + select ARCH_HAS_STRICT_KERNEL_RWX if (PPC_BOOK3S_64 && !HIBERNATION) select ARCH_OPTIONAL_KERNEL_RWX if ARCH_HAS_STRICT_KERNEL_RWX select HAVE_CBPF_JITif !PPC64 select HAVE_CONTEXT_TRACKINGif PPC64 -- 2.9.4
[PATCH v2 1/3] powerpc/mm/radix: Fix relocatable radix mappings for STRICT_RWX
The mappings now do perfect kernel pte mappings even when the kernel is relocated. This patch refactors create_physical_mapping() and mark_rodata_ro(). create_physical_mapping() is now largely done with a helper called __create_physical_mapping(), which is defined differently for when CONFIG_STRICT_KERNEL_RWX is enabled and when its not. The goal of the patchset is to provide minimal changes when the CONFIG_STRICT_KERNEL_RWX is disabled, when enabled however, we do split the linear mapping so that permissions are strictly adherent to expectations from the user. Signed-off-by: Balbir Singh--- arch/powerpc/mm/pgtable-radix.c | 183 +--- 1 file changed, 151 insertions(+), 32 deletions(-) diff --git a/arch/powerpc/mm/pgtable-radix.c b/arch/powerpc/mm/pgtable-radix.c index 671a45d..6e0176d 100644 --- a/arch/powerpc/mm/pgtable-radix.c +++ b/arch/powerpc/mm/pgtable-radix.c @@ -164,8 +164,14 @@ void radix__mark_rodata_ro(void) end = (unsigned long)__init_begin; radix__change_memory_range(start, end, _PAGE_WRITE); + + start = (unsigned long)__start_interrupts - PHYSICAL_START; + end = (unsigned long)__end_interrupts - PHYSICAL_START; + + radix__change_memory_range(start, end, _PAGE_WRITE); } + void radix__mark_initmem_nx(void) { unsigned long start = (unsigned long)__init_begin; @@ -173,6 +179,7 @@ void radix__mark_initmem_nx(void) radix__change_memory_range(start, end, _PAGE_EXEC); } + #endif /* CONFIG_STRICT_KERNEL_RWX */ static inline void __meminit print_mapping(unsigned long start, @@ -185,31 +192,36 @@ static inline void __meminit print_mapping(unsigned long start, pr_info("Mapped range 0x%lx - 0x%lx with 0x%lx\n", start, end, size); } -static int __meminit create_physical_mapping(unsigned long start, -unsigned long end) +/* + * Create physical mapping and return the last mapping size + * If the call is successful, end_of_mapping will return the + * last address mapped via this call, if not, it will leave + * the value untouched. + */ +static int __meminit __create_physical_mapping(unsigned long vstart, + unsigned long vend, pgprot_t prot, + unsigned long *end_of_mapping) { - unsigned long vaddr, addr, mapping_size = 0; - pgprot_t prot; - unsigned long max_mapping_size; -#ifdef CONFIG_STRICT_KERNEL_RWX - int split_text_mapping = 1; -#else - int split_text_mapping = 0; -#endif + unsigned long mapping_size = 0; + static unsigned long previous_size; + unsigned long addr, start, end; + start = __pa(vstart); + end = __pa(vend); start = _ALIGN_UP(start, PAGE_SIZE); + + pr_devel("physical_mapping start %lx->%lx, prot %lx\n", +vstart, vend, pgprot_val(prot)); + for (addr = start; addr < end; addr += mapping_size) { - unsigned long gap, previous_size; + unsigned long gap; int rc; gap = end - addr; previous_size = mapping_size; - max_mapping_size = PUD_SIZE; -retry: if (IS_ALIGNED(addr, PUD_SIZE) && gap >= PUD_SIZE && - mmu_psize_defs[MMU_PAGE_1G].shift && - PUD_SIZE <= max_mapping_size) + mmu_psize_defs[MMU_PAGE_1G].shift) mapping_size = PUD_SIZE; else if (IS_ALIGNED(addr, PMD_SIZE) && gap >= PMD_SIZE && mmu_psize_defs[MMU_PAGE_2M].shift) @@ -217,40 +229,147 @@ static int __meminit create_physical_mapping(unsigned long start, else mapping_size = PAGE_SIZE; - if (split_text_mapping && (mapping_size == PUD_SIZE) && - (addr <= __pa_symbol(__init_begin)) && - (addr + mapping_size) >= __pa_symbol(_stext)) { - max_mapping_size = PMD_SIZE; - goto retry; + if (previous_size != mapping_size) { + print_mapping(start, addr, previous_size); + start = addr; + previous_size = mapping_size; } - if (split_text_mapping && (mapping_size == PMD_SIZE) && - (addr <= __pa_symbol(__init_begin)) && - (addr + mapping_size) >= __pa_symbol(_stext)) - mapping_size = PAGE_SIZE; + rc = radix__map_kernel_page((unsigned long)__va(addr), addr, + prot, mapping_size); + if (rc) + return rc; + } - if (mapping_size != previous_size) { - print_mapping(start, addr, previous_size); - start = addr; +
[PATCH v2 2/3] powerpc/mm/hash: WARN if relocation is enabled and CONFIG_STRICT_KERNEL_RWX
For radix we split the mapping into smaller page sizes (at the cost of additional TLB overhead), but for hash its best to print a warning. In the case of hash and no-relocation, the kernel should be well aligned to provide the least overhead with the current linear mapping size (16M) Signed-off-by: Balbir Singh--- arch/powerpc/mm/pgtable-hash64.c | 28 ++-- 1 file changed, 26 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/mm/pgtable-hash64.c b/arch/powerpc/mm/pgtable-hash64.c index 443a2c6..656f7f3 100644 --- a/arch/powerpc/mm/pgtable-hash64.c +++ b/arch/powerpc/mm/pgtable-hash64.c @@ -434,8 +434,26 @@ static bool hash__change_memory_range(unsigned long start, unsigned long end, shift = mmu_psize_defs[mmu_linear_psize].shift; step = 1 << shift; - start = ALIGN_DOWN(start, step); - end = ALIGN(end, step); // aligns up + if (!IS_ALIGNED(PHYSICAL_START, step)) { + /* +* For the relocatable case we might have +* a case where _stext shares the page +* with rw memory or __init_begin might +* share the page with executable text. +* This breaks strict RWX, but allows the +* kernel to boot. If PHYSICAL_START is mmu_linear_psize +* aligned, then we can continue to make the same +* assumptions as the non-relocatable case. +* +* TODO: If we really care about the relocatable +* case, we can align __init_begin/end better. +*/ + start = ALIGN(start, step); + end = ALIGN_DOWN(end, step); + } else { + start = ALIGN_DOWN(start, step); + end = ALIGN(end, step); /* Aligns up */ + } if (start >= end) return false; @@ -455,6 +473,12 @@ void hash__mark_rodata_ro(void) { unsigned long start, end; + if (PHYSICAL_START > MEMORY_START) + pr_warn("Detected relocation and CONFIG_STRICT_KERNEL_RWX " + "permissions are best effort, some non-text area " + "might still be left as executable"); + + start = (unsigned long)_stext; end = (unsigned long)__init_begin; -- 2.9.4
[PATCH v2 0/3] Have CONFIG_STRICT_KERNEL_RWX work with CONFIG_RELOCATABLE
These patches make CONFIG_STRICT_KERNEL_RWX work with CONFIG_RELOCATABLE The first patch splits up the radix linear mapping nicely on relocation to support granular read-only and execution bits. The second patch warns if relocation is actually done (PHYSICAL_START > MEMORY_START), we do best effort support of expected permissions. We could do more granular linear mapping, but we decided to leave it as a TODO (to check for performance/MPSS/etc). The last patch changes the config so that we are no longer dependent on !RELOCATABLE for CONFIG_STRICT_KERNEL_RWX feature. Changelog v2 - Rebase on top of the changes made in v4.13 - Move hash tables to IS_ALIGNED logic Balbir Singh (3): powerpc/mm/radix: Fix relocatable radix mappings for STRICT_RWX powerpc/mm/hash: WARN if relocation is enabled and CONFIG_STRICT_KERNEL_RWX powerpc/strict_kernel_rwx: Don't depend on !RELOCATABLE arch/powerpc/Kconfig | 2 +- arch/powerpc/mm/pgtable-hash64.c | 28 +- arch/powerpc/mm/pgtable-radix.c | 183 --- 3 files changed, 178 insertions(+), 35 deletions(-) -- 2.9.4
Re: blk_mq_sched_insert_request: inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage
Brian Kingwrites: > On 07/28/2017 10:17 AM, Brian J King wrote: >> Jens Axboe wrote on 07/28/2017 09:25:48 AM: >> >>> Can you try the below fix? Should be more palatable than the previous >>> one. Brian, maybe you can take a look at the IRQ issue mentioned above? > > Michael, > > Does this address the issue you are seeing? Yes it seems to, thanks. I only see the trace on reboot, and not 100% of the time. But I've survived a couple of reboots now without seeing anything, so I think this is helping. I'll put the patch in my Jenkins over night and let you know how it survives that, which should be ~= 25 boots. cheers
Re: [PATCH 23/24] powerpc/mm: Cleanup check for stack expansion
Le 25/07/2017 à 13:19, Michael Ellerman a écrit : LEROY Christophewrites: Michael Ellerman a écrit : LEROY Christophe writes: Benjamin Herrenschmidt a écrit : When hitting below a VM_GROWSDOWN vma (typically growing the stack), we check whether it's a valid stack-growing instruction and we check the distance to GPR1. This is largely open coded with lots of comments, so move it out to a helper. Did you have a look at the following patch ? It's been waiting for application for some weeks now. https://patchwork.ozlabs.org/patch/771869 I actually merged it last merge window, but found I had no good way to test it, so I took it out again until I can write a test case for it. The way I realised it wasn't being tested was by removing all the store_updates_sp logic entirely and having my system run happily for several days :} Which demonstrates how unlikely this is, hence doing that get_user() at every fault is waste of time. Yes I agree. How do you plan to handle that in parralele to ben's serie ? Not sure :) I'll be back from vacation next week and may help finding a way to test that. (A test program using alloca() ?) I was thinking hand-crafted asm, but that might be a pain to get working for 32 & 64-bit, in which case alloca() might work. No need of very sofisticated thing indeed. The following app makes the trick. If I modify store_updates_sp() to always return 0, the app gets a SIGSEGV. #include #include int main(int argc, char **argv) { char buf[1024 * 1025]; sprintf(buf, "Hello world !\n"); printf(buf); exit(0); } Christophe cheers
Re: RCU lockup issues when CONFIG_SOFTLOCKUP_DETECTOR=n - any one else seeing this?
On Wed, 26 Jul 2017 16:15:05 -0700 "Paul E. McKenney"wrote: > On Wed, Jul 26, 2017 at 03:45:40PM -0700, David Miller wrote: > > From: "Paul E. McKenney" > > Date: Wed, 26 Jul 2017 15:36:58 -0700 > > > > > And without CONFIG_SOFTLOCKUP_DETECTOR, I see five runs of 24 with RCU > > > CPU stall warnings. So it seems likely that CONFIG_SOFTLOCKUP_DETECTOR > > > really is having an effect. > > > > Thanks for all of the info Paul, I'll digest this and scan over the > > code myself. > > > > Just out of curiousity, what x86 idle method is your machine using? > > The mwait one or the one which simply uses 'halt'? The mwait variant > > might mask this bug, and halt would be a lot closer to how sparc64 and > > Jonathan's system operates. > > My kernel builds with CONFIG_INTEL_IDLE=n, which I believe means that > I am not using the mwait one. Here is a grep for IDLE in my .config: > > CONFIG_NO_HZ_IDLE=y > CONFIG_GENERIC_SMP_IDLE_THREAD=y > # CONFIG_IDLE_PAGE_TRACKING is not set > CONFIG_ACPI_PROCESSOR_IDLE=y > CONFIG_CPU_IDLE=y > # CONFIG_CPU_IDLE_GOV_LADDER is not set > CONFIG_CPU_IDLE_GOV_MENU=y > # CONFIG_ARCH_NEEDS_CPU_IDLE_COUPLED is not set > # CONFIG_INTEL_IDLE is not set > > > On sparc64 the cpu yield we do in the idle loop sleeps the cpu. It's > > local TICK register keeps advancing, and the local timer therefore > > will still trigger. Also, any externally generated interrupts > > (including cross calls) will wake up the cpu as well. > > > > The tick-sched code is really tricky wrt. NO_HZ even in the NO_HZ_IDLE > > case. One of my running theories is that we miss scheduling a tick > > due to a race. That would be consistent with the behavior we see > > in the RCU dumps, I think. > > But wouldn't you have to miss a -lot- of ticks to get an RCU CPU stall > warning? By default, your grace period needs to extend for more than > 21 seconds (more than one-third of a -minute-) to get one. Or do > you mean that the ticks get shut off now and forever, as opposed to > just losing one of them? > > > Anyways, just a theory, and that's why I keep mentioning that commit > > about the revert of the revert (specifically > > 411fe24e6b7c283c3a1911450cdba6dd3aaea56e). > > > > :-) > > I am running an overnight test in preparation for attempting to push > some fixes for regressions into 4.12, but will try reverting this > and enabling CONFIG_HZ_PERIODIC tomorrow. > > Jonathan, might the commit that Dave points out above be what reduces > the probability of occurrence as you test older releases? I just got around to trying this out of curiosity. Superficially it did appear to possibly make the issue harder to hit took over 30 minutes but the issue otherwise looks much the same with or without that patch. Just out of curiosity, next thing on my list is to disable hrtimers entirely and see what happens. Jonathan > > Thanx, Paul >
Re: RCU lockup issues when CONFIG_SOFTLOCKUP_DETECTOR=n - any one else seeing this?
On Fri, 28 Jul 2017 12:03:50 -0700 "Paul E. McKenney"wrote: > On Fri, Jul 28, 2017 at 06:27:05PM +0100, Jonathan Cameron wrote: > > On Fri, 28 Jul 2017 09:55:29 -0700 > > "Paul E. McKenney" wrote: > > > > > On Fri, Jul 28, 2017 at 02:24:03PM +0100, Jonathan Cameron wrote: > > > > On Fri, 28 Jul 2017 08:44:11 +0100 > > > > Jonathan Cameron wrote: > > > > > > [ . . . ] > > > > > > > Ok. Some info. I disabled a few driver (usb and SAS) in the interest > > > > of having > > > > fewer timer events. Issue became much easier to trigger (on some runs > > > > before > > > > I could get tracing up and running) > > > >e > > > > So logs are large enough that pastebin doesn't like them - please shoet > > > > if > > > >>e another timer period is of interest. > > > > > > > > https://pastebin.com/iUZDfQGM for the timer trace. > > > > https://pastebin.com/3w1F7amH for dmesg. > > > > > > > > The relevant timeout on the RCU stall detector was 8 seconds. Event is > > > > detected around 835. > > > > > > > > It's a lot of logs, so I haven't identified a smoking gun yet but there > > > > may well be one in there. > > > > > > The dmesg says: > > > > > > rcu_preempt kthread starved for 2508 jiffies! g112 c111 f0x0 > > > RCU_GP_WAIT_FQS(3) ->state=0x1 > > > > > > So I look for "rcu_preempt" timer events and find these: > > > > > > rcu_preempt-9 [019] 827.579114: timer_init: > > > timer=8017d5fc7da0 > > > rcu_preempt-9 [019] d..1 827.579115: timer_start: > > > timer=8017d5fc7da0 function=process_timeout > > > > > > Next look for "8017d5fc7da0" and I don't find anything else. > > It does show up off the bottom of what would fit in pastebin... > > > > rcu_preempt-9 [001] d..1 837.681077: timer_cancel: > > timer=8017d5fc7da0 > > rcu_preempt-9 [001] 837.681086: timer_init: > > timer=8017d5fc7da0 > > rcu_preempt-9 [001] d..1 837.681087: timer_start: > > timer=8017d5fc7da0 function=process_timeout expires=4295101298 > > [timeout=1] cpu=1 idx=0 flags= > > Odd. I would expect an expiration... And ten seconds is way longer > than the requested one jiffy! > > > > The timeout was one jiffy, and more than a second later, no expiration. > > > Is it possible that this event was lost? I am not seeing any sign of > > > this is the trace. > > > > > > I don't see any sign of CPU hotplug (and I test with lots of that in > > > any case). > > > > > > The last time we saw something like this it was a timer HW/driver problem, > > > but it is a bit hard to imagine such a problem affecting both ARM64 > > > and SPARC. ;-) > > Could be different issues, both of which were hidden by that lockup > > detector. > > > > There is an errata work around for the timers on this particular board. > > I'm only vaguely aware of it, so may be unconnected. > > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/clocksource/arm_arch_timer.c?h=v4.13-rc2=bb42ca47401010fc02901b5e8f79e40a26f208cb > > > > Seems unlikely though! + we've not yet seen it on the other chips that > > errata effects (not that that means much). > > If you can reproduce quickly, might be worth trying anyway... > > Thanx, Paul Errata fix is running already and was for all those tests. I'll have a dig into the timers today and see where I get to. Jonathan > > > Jonathan > > > > > > > > Thomas, any debugging suggestions? > > > > > > Thanx, Paul > > > > > >
RE: [PATCH v2] qe: fix compile issue for arm64
Qiang Zhaowrites: > Fri 7/28/2017 2:14 PM, Michael Ellerman wrote: > >> -Original Message- >> From: Michael Ellerman [mailto:m...@ellerman.id.au] >> Sent: Friday, July 28, 2017 2:14 PM >> To: Qiang Zhao ; o...@buserror.net >> Cc: valentin.longch...@keymile.com; linuxppc-dev@lists.ozlabs.org; linux- >> ker...@vger.kernel.org; Qiang Zhao >> Subject: Re: [PATCH v2] qe: fix compile issue for arm64 >> >> Zhao Qiang writes: >> >> > Signed-off-by: Zhao Qiang >> > --- >> > Changes for v2: >> >- include all Errata QE_General4 in #ifdef >> > >> > drivers/soc/fsl/qe/qe.c | 2 ++ >> > 1 file changed, 2 insertions(+) >> >> AFAICS this driver can only be built on PPC, what am I missing? >> >> config QUICC_ENGINE >> bool "Freescale QUICC Engine (QE) Support" >> depends on FSL_SOC && PPC32 >> >> cheers > > I sent another patchset to support it on arm64. Where? I don't see it. Shouldn't this patch be part of that series? Otherwise when that series is merged the build will break on arm64. cheers
Re: [PATCH 2/5] Fix "ERROR: code indent should use tabs where possible"
SZ Linwrites: > ERROR: code indent should use tabs where possible > +^I^I "Need to wait for TPM to finish\n");$ > > Signed-off-by: SZ Lin > --- > drivers/char/tpm/tpm_ibmvtpm.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/char/tpm/tpm_ibmvtpm.c b/drivers/char/tpm/tpm_ibmvtpm.c > index f01d083eced2..23913fc86158 100644 > --- a/drivers/char/tpm/tpm_ibmvtpm.c > +++ b/drivers/char/tpm/tpm_ibmvtpm.c > @@ -127,7 +127,7 @@ static int tpm_ibmvtpm_send(struct tpm_chip *chip, u8 > *buf, size_t count) > > if (ibmvtpm->tpm_processing_cmd) { > dev_info(ibmvtpm->dev, > - "Need to wait for TPM to finish\n"); > + "Need to wait for TPM to finish\n"); There's no reason for that to be on a separate line at all. Just make it a single line dev_info( ... ); cheers
Re: [RFC PATCH] powerpc: improve accounting of non maskable interrupts
Nicholas Pigginwrites: > This fixes a case of double counting MCEs on PowerNV. > > Adds a counter for the system reset interrupt, which will > see more use as a debugging NMI. > > Adds a soft-NMI counter for the 64s watchdog. Although this could cause > confusion because it only fires when interrupts are soft-disabled, so it > won't increment much even when the watchdog is running. > > Signed-off-by: Nicholas Piggin > --- > I can split these out or drop any objectionable bits. At least the > MCE we should fix, not sure if the other bits are wanted. Yeah if you can split it please that would be good. I'm not sure how useful it is to count the SOFT NMIs. I guess it's not much overhead so we may as well, if nothing else it might be handy for us debugging. If you can make the ifdef look less horrendous that would be great :D - perhaps a kconfig symbol that keys off both. cheers
Re: [PATCH] mpc832x_rdb: fix of_irq_to_resource() error check
Scott Woodwrites: > On Sat, 2017-07-29 at 22:52 +0300, Sergei Shtylyov wrote: >> of_irq_to_resource() has recently been fixed to return negative error #'s >> along with 0 in case of failure, however the Freescale MPC832x RDB board >> code still only regards 0 as as failure indication -- fix it up. >> >> Fixes: 7a4228bbff76 ("of: irq: use of_irq_get() in of_irq_to_resource()") >> Signed-off-by: Sergei Shtylyov >> >> --- >> The patch is against the 'master' branch of Scott Wood's 'linux.git' repo >> (the 'fixes' branch is too much behind). > > The master branch is also old. Those branches are only used when needed to > apply patches; I don't update them just to sync up. If they're older than > what's in Michael's or Linus's tree (as they almost always are), then use > those instead. > > Not that I expect it to make a difference to this patch... Do you want me to grab this as a fix for 4.13 ? cheers
[PATCH] powerpc/perf: Add PM_LD_MISS_L1 and PM_BR_2PATH to power9 event list
Add couple of more events (PM_LD_MISS_L1 and PM_BR_2PATH) to power9 event list and power9_event_alternatives array (these events can be counted in more than one PMC). Signed-off-by: Madhavan Srinivasan--- arch/powerpc/perf/power9-events-list.h | 8 arch/powerpc/perf/power9-pmu.c | 2 ++ 2 files changed, 10 insertions(+) diff --git a/arch/powerpc/perf/power9-events-list.h b/arch/powerpc/perf/power9-events-list.h index 50689180a6c1..c5ed95d8f976 100644 --- a/arch/powerpc/perf/power9-events-list.h +++ b/arch/powerpc/perf/power9-events-list.h @@ -23,6 +23,9 @@ EVENT(PM_BR_MPRED_CMPL, 0x400f6) EVENT(PM_LD_REF_L1,0x100fc) /* Load Missed L1 */ EVENT(PM_LD_MISS_L1_FIN, 0x2c04e) +EVENT(PM_LD_MISS_L1, 0x3e054) +/* Alternate event code for PM_LD_MISS_L1 */ +EVENT(PM_LD_MISS_L1_ALT, 0x400f0) /* Store Missed L1 */ EVENT(PM_ST_MISS_L1, 0x300f0) /* L1 cache data prefetches */ @@ -62,3 +65,8 @@ EVENT(PM_INST_DISP, 0x200f2) EVENT(PM_INST_DISP_ALT,0x300f2) /* Alternate Branch event code */ EVENT(PM_BR_CMPL_ALT, 0x10012) +/* Branch event that are not strongly biased */ +EVENT(PM_BR_2PATH, 0x20036) +/* ALternate branch event that are not strongly biased */ +EVENT(PM_BR_2PATH_ALT, 0x40036) + diff --git a/arch/powerpc/perf/power9-pmu.c b/arch/powerpc/perf/power9-pmu.c index 805bdfcb38ec..ed92159adf3e 100644 --- a/arch/powerpc/perf/power9-pmu.c +++ b/arch/powerpc/perf/power9-pmu.c @@ -109,6 +109,8 @@ static const unsigned int power9_event_alternatives[][MAX_ALT] = { { PM_INST_DISP, PM_INST_DISP_ALT }, { PM_RUN_CYC_ALT, PM_RUN_CYC }, { PM_RUN_INST_CMPL_ALT, PM_RUN_INST_CMPL }, + { PM_LD_MISS_L1,PM_LD_MISS_L1_ALT }, + { PM_BR_2PATH, PM_BR_2PATH_ALT }, }; static int power9_get_alternatives(u64 event, unsigned int flags, u64 alt[]) -- 2.7.4
Re: [PATCH v2] powerpc/powernv: Use darn instr for random_seed on p9
Hi Matt, A few comments inline ... Matt Brownwrites: > Currently ppc_md.get_random_seed uses the powernv_get_random_long function. > A guest calling this function would have to go through the hypervisor. The This is not quite right. The powernv routine is only ever used on bare metal. In a guest we use pseries_get_random_long(), which does go via they hypervisor. On both Power8 and Power9 there is a hardware RNG per chip. The difference is that on Power8 we had to go and find it in the device tree and read from it via MMIO. On Power9 that is all done transparently for us by the DARN instruction. > 'darn' instruction, introduced in POWER9, allows us to bypass this by > directly obtaining a value from the mmio region. > > This patch adds a function for ppc_md.get_random_seed on p9, > utilising the darn instruction. > > Signed-off-by: Matt Brown > --- > v2: > - remove repeat darn attempts > - move hook to rng_init > --- > arch/powerpc/include/asm/ppc-opcode.h | 4 > arch/powerpc/platforms/powernv/rng.c | 22 ++ > 2 files changed, 26 insertions(+) > > diff --git a/arch/powerpc/include/asm/ppc-opcode.h > b/arch/powerpc/include/asm/ppc-opcode.h > index c4ced1d..d5f7082 100644 > --- a/arch/powerpc/include/asm/ppc-opcode.h > +++ b/arch/powerpc/include/asm/ppc-opcode.h > @@ -134,6 +134,7 @@ > #define PPC_INST_COPY0x7c00060c > #define PPC_INST_COPY_FIRST 0x7c20060c > #define PPC_INST_CP_ABORT0x7c00068c > +#define PPC_INST_DARN0x7c0005e6 That looks right to me. > @@ -325,6 +326,9 @@ > > /* Deal with instructions that older assemblers aren't aware of */ > #define PPC_CP_ABORTstringify_in_c(.long PPC_INST_CP_ABORT) > +#define PPC_DARN(t, l) stringify_in_c(.long PPC_INST_DARN | \ > + ___PPC_RT(t) | \ > + ___PPC_RA(l)) But this is not quite. The macros are: #define ___PPC_RA(a)(((a) & 0x1f) << 16) #define ___PPC_RS(s)(((s) & 0x1f) << 21) #define ___PPC_RT(t)___PPC_RS(t) But the definition of darn is: +-+-+-+-+-+-++ |31 |RT |/|L|/| 755 | / | | 31 - 26 | 25 - 21 | 20 - 18 | 17 - 16 | 15 - 11 | 10 - 1 | 0 | +-+-+-+-+-+-++ Using ___PPC_RT() gets you the right shift and mask, but because it's the triple underscore verison, it doesn't check that you pass a register number to it. You should use __PPC_RT() instead. And ___PPC_RA() is not quite right. The L field is only 2 bits wide, not the 5 that ___PPC_RA() allows. We don't have a __PPC_L() macro, because L fields vary in size and location. So I think you're best of open coding it, eg: +#define PPC_DARN(t, l) stringify_in_c(.long PPC_INST_DARN | \ + __PPC_RT(t)| \ + (((l) & 0x3) << 16)) > diff --git a/arch/powerpc/platforms/powernv/rng.c > b/arch/powerpc/platforms/powernv/rng.c > index 5dcbdea..ab6f411 100644 > --- a/arch/powerpc/platforms/powernv/rng.c > +++ b/arch/powerpc/platforms/powernv/rng.c > @@ -8,6 +8,7 @@ > */ > > #define pr_fmt(fmt) "powernv-rng: " fmt > +#define DARN_ERR 0xul Usual place for constants is after all the includes, before the first code or variables. > #include > #include > @@ -16,6 +17,7 @@ > #include > #include > #include > +#include > #include > #include > #include > @@ -67,6 +69,21 @@ int powernv_get_random_real_mode(unsigned long *v) > return 1; > } > > +int powernv_get_random_darn(unsigned long *v) > +{ > + unsigned long val; > + > + /* Using DARN with L=1 - conditioned random number */ Actually L=1 is a 64-bit conditioned random number, vs L=0 for 32-bit. > + asm (PPC_DARN(%0, 1)"\n" : "=r"(val) :); > + > + if (val == DARN_ERR) > + return 0; > + > + *v = val; > + > + return 1; > +} > + > int powernv_get_random_long(unsigned long *v) > { > struct powernv_rng *rng; > @@ -136,6 +153,7 @@ static __init int rng_create(struct device_node *dn) > static __init int rng_init(void) > { > struct device_node *dn; > + unsigned long drn_test; Buy yourself a vowel! ;) > int rc; > > for_each_compatible_node(dn, NULL, "ibm,power-rng") { > @@ -150,6 +168,10 @@ static __init int rng_init(void) > of_platform_device_create(dn, NULL, NULL); > } > > + if (cpu_has_feature(CPU_FTR_ARCH_300) && > + powernv_get_random_darn(_test)) > + ppc_md.get_random_seed = powernv_get_random_darn; I know we took the loop out of powernv_get_random_darn(), but we might want to put it back in this case. ie. we don't want to skip
Re: [PATCH] powerpc/asm/cacheflush: Cleanup cacheflush function params
Le 20/07/2017 à 08:28, Matt Brown a écrit : The cacheflush prototypes currently use start and stop values and each call requires typecasting the address to an unsigned long. This patch changes the cacheflush prototypes to follow the x86 style of using a base and size values, with base being a void pointer. All callers of the cacheflush functions, including drivers, have been modified to conform to the new prototypes. The 64 bit cacheflush functions which were implemented in assembly code (flush_dcache_range, flush_inval_dcache_range) have been translated into C for readability and coherence. Signed-off-by: Matt Brown--- arch/powerpc/include/asm/cacheflush.h| 47 + arch/powerpc/kernel/misc_64.S| 52 arch/powerpc/mm/dma-noncoherent.c| 15 arch/powerpc/platforms/512x/mpc512x_shared.c | 10 +++--- arch/powerpc/platforms/85xx/smp.c| 6 ++-- arch/powerpc/sysdev/dart_iommu.c | 5 +-- drivers/ata/pata_bf54x.c | 3 +- drivers/char/agp/uninorth-agp.c | 6 ++-- drivers/gpu/drm/drm_cache.c | 3 +- drivers/macintosh/smu.c | 15 drivers/mmc/host/bfin_sdh.c | 3 +- drivers/mtd/nand/bf5xx_nand.c| 6 ++-- drivers/soc/fsl/qbman/dpaa_sys.h | 2 +- drivers/soc/fsl/qbman/qman_ccsr.c| 3 +- drivers/spi/spi-bfin5xx.c| 10 +++--- drivers/tty/serial/mpsc.c| 46 drivers/usb/musb/blackfin.c | 6 ++-- 17 files changed, 86 insertions(+), 152 deletions(-) [...] diff --git a/arch/powerpc/mm/dma-noncoherent.c b/arch/powerpc/mm/dma-noncoherent.c index 3825284..5fd3171 100644 --- a/arch/powerpc/mm/dma-noncoherent.c +++ b/arch/powerpc/mm/dma-noncoherent.c @@ -204,9 +204,9 @@ __dma_alloc_coherent(struct device *dev, size_t size, dma_addr_t *handle, gfp_t * kernel direct-mapped region for device DMA. */ { - unsigned long kaddr = (unsigned long)page_address(page); + void *kaddr = page_address(page); memset(page_address(page), 0, size); - flush_dcache_range(kaddr, kaddr + size); + flush_dcache_range(kaddr, size); } /* @@ -316,9 +316,6 @@ EXPORT_SYMBOL(__dma_free_coherent); */ void __dma_sync(void *vaddr, size_t size, int direction) { - unsigned long start = (unsigned long)vaddr; - unsigned long end = start + size; - switch (direction) { case DMA_NONE: BUG(); @@ -328,15 +325,15 @@ void __dma_sync(void *vaddr, size_t size, int direction) * the potential for discarding uncommitted data from the cache */ if ((start | end) & (L1_CACHE_BYTES - 1)) How can the above compile when 'start' and 'end' are removed ? Shouldn't it be replaced by if ((vaddr | size) & (L1_CACHE_BYTES - 1)) - flush_dcache_range(start, end); + flush_dcache_range(vaddr, size); else - invalidate_dcache_range(start, end); + invalidate_dcache_range(vaddr, size); break; case DMA_TO_DEVICE: /* writeback only */ - clean_dcache_range(start, end); + clean_dcache_range(vaddr, size); break; case DMA_BIDIRECTIONAL: /* writeback and invalidate */ - flush_dcache_range(start, end); + flush_dcache_range(vaddr, size); break; } } [...] Christophe
Re: [RFC Part1 PATCH v3 10/17] resource: Provide resource struct in resource walk callback
On Mon, Jul 24, 2017 at 02:07:50PM -0500, Brijesh Singh wrote: > From: Tom Lendacky> > In prep for a new function that will need additional resource information > during the resource walk, update the resource walk callback to pass the > resource structure. Since the current callback start and end arguments > are pulled from the resource structure, the callback functions can obtain > them from the resource structure directly. > > Signed-off-by: Tom Lendacky > Signed-off-by: Brijesh Singh > --- > arch/powerpc/kernel/machine_kexec_file_64.c | 12 +--- > arch/x86/kernel/crash.c | 18 +- > arch/x86/kernel/pmem.c | 2 +- > include/linux/ioport.h | 4 ++-- > include/linux/kexec.h | 2 +- > kernel/kexec_file.c | 5 +++-- > kernel/resource.c | 9 + > 7 files changed, 30 insertions(+), 22 deletions(-) Reviewed-by: Borislav Petkov -- Regards/Gruss, Boris. SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg) --
[PATCH] powerpc/perf: Factor out PPMU_ONLY_COUNT_RUN check code from power8
There are some hardware events on Power systems which only count when the processor is not idle, and there are some fixed-function counters which count such events. For example, the "run cycles" event counts cycles when the processor is not idle. If the user asks to count cycles, we can use "run cycles" if this is a per-task event, since the processor is running when the task is running, by definition. We can't use "run cycles" if the user asks for "cycles" on a system-wide counter. Currently in power8 this check is done using PPMU_ONLY_COUNT_RUN flag in power8_get_alternatives() function. Based on the flag, events are switched if needed. This function should also be enabled in power9, so factor out the code to isa207_get_alternatives(). Fixes: efe881afdd999 ('powerpc/perf: Factor out event_alternative function') Reported-by: Anton BlanchardSigned-off-by: Madhavan Srinivasan --- arch/powerpc/perf/isa207-common.c | 29 +++-- arch/powerpc/perf/isa207-common.h | 4 ++-- arch/powerpc/perf/power8-pmu.c| 33 + arch/powerpc/perf/power9-pmu.c| 5 +++-- 4 files changed, 37 insertions(+), 34 deletions(-) diff --git a/arch/powerpc/perf/isa207-common.c b/arch/powerpc/perf/isa207-common.c index 3f3aa9a7063a..26c81aaa4c2d 100644 --- a/arch/powerpc/perf/isa207-common.c +++ b/arch/powerpc/perf/isa207-common.c @@ -488,8 +488,8 @@ static int find_alternative(u64 event, const unsigned int ev_alt[][MAX_ALT], int return -1; } -int isa207_get_alternatives(u64 event, u64 alt[], - const unsigned int ev_alt[][MAX_ALT], int size) +int isa207_get_alternatives(u64 event, u64 alt[], int size, unsigned int flags, + const unsigned int ev_alt[][MAX_ALT]) { int i, j, num_alt = 0; u64 alt_event; @@ -505,5 +505,30 @@ int isa207_get_alternatives(u64 event, u64 alt[], } } + if (flags & PPMU_ONLY_COUNT_RUN) { + /* +* We're only counting in RUN state, so PM_CYC is equivalent to +* PM_RUN_CYC and PM_INST_CMPL === PM_RUN_INST_CMPL. +*/ + j = num_alt; + for (i = 0; i < num_alt; ++i) { + switch (alt[i]) { + case 0x1e: /* PMC_CYC */ + alt[j++] = 0x600f4; /* PM_RUN_CYC */ + break; + case 0x600f4: + alt[j++] = 0x1e; + break; + case 0x2: /* PM_INST_CMPL */ + alt[j++] = 0x500fa; /* PM_RUN_INST_CMPL */ + break; + case 0x500fa: + alt[j++] = 0x2; + break; + } + } + num_alt = j; + } + return num_alt; } diff --git a/arch/powerpc/perf/isa207-common.h b/arch/powerpc/perf/isa207-common.h index 8acbe6e802c7..348cb6b9b911 100644 --- a/arch/powerpc/perf/isa207-common.h +++ b/arch/powerpc/perf/isa207-common.h @@ -287,8 +287,8 @@ int isa207_compute_mmcr(u64 event[], int n_ev, unsigned int hwc[], unsigned long mmcr[], struct perf_event *pevents[]); void isa207_disable_pmc(unsigned int pmc, unsigned long mmcr[]); -int isa207_get_alternatives(u64 event, u64 alt[], - const unsigned int ev_alt[][MAX_ALT], int size); +int isa207_get_alternatives(u64 event, u64 alt[], int size, unsigned int flags, + const unsigned int ev_alt[][MAX_ALT]); void isa207_get_mem_data_src(union perf_mem_data_src *dsrc, u32 flags, struct pt_regs *regs); void isa207_get_mem_weight(u64 *weight); diff --git a/arch/powerpc/perf/power8-pmu.c b/arch/powerpc/perf/power8-pmu.c index 5463516e369b..0bd27769e81e 100644 --- a/arch/powerpc/perf/power8-pmu.c +++ b/arch/powerpc/perf/power8-pmu.c @@ -50,34 +50,11 @@ static const unsigned int event_alternatives[][MAX_ALT] = { static int power8_get_alternatives(u64 event, unsigned int flags, u64 alt[]) { - int i, j, num_alt = 0; - - num_alt = isa207_get_alternatives(event, alt, event_alternatives, - (int)ARRAY_SIZE(event_alternatives)); - if (flags & PPMU_ONLY_COUNT_RUN) { - /* -* We're only counting in RUN state, so PM_CYC is equivalent to -* PM_RUN_CYC and PM_INST_CMPL === PM_RUN_INST_CMPL. -*/ - j = num_alt; - for (i = 0; i < num_alt; ++i) { - switch (alt[i]) { - case PM_CYC: -
Re: [PATCH 1/2] KVM: PPC: e500: fix some NULL dereferences on error
On Mon, Jul 31, 2017 at 04:03:40PM +1000, Paul Mackerras wrote: > On Thu, Jul 13, 2017 at 10:38:29AM +0300, Dan Carpenter wrote: > > There are some error paths in kvmppc_core_vcpu_create_e500() where we > > forget to set the error code. It means that we return ERR_PTR(0) which > > is NULL and it results in a NULL pointer dereference in the caller. > > > > Signed-off-by: Dan Carpenter> > Are these user-triggerable, and therefore needing to go into 4.13 > and be back-ported to the stable trees? Or can they wait for 4.14? > These are static checker fixes... I imagine that they might be user triggerable with quite a bit of work but it's a only NULL derefence. regards, dan carpenter
[RESEND][PATCH V10 0/3] powernv : Add support for OPAL-OCC command/response interface
In P9, OCC (On-Chip-Controller) supports shared memory based commad-response interface. Within the shared memory there is an OPAL command buffer and OCC response buffer that can be used to send inband commands to OCC. The following commands are supported: 1) Set system powercap 2) Set CPU-GPU power shifting ratio 3) Clear min/max for OCC sensor groups Changes from V9: - Fixed return after erroring from mutex_lock_interruptible() - Added documentation - [RESEND] Fixed the version number of the patch-set in Subject Shilpasri G Bhat (3): powernv: powercap: Add support for powercap framework powernv: Add support to set power-shifting-ratio powernv: Add support to clear sensor groups data .../ABI/testing/sysfs-firmware-opal-powercap | 31 +++ Documentation/ABI/testing/sysfs-firmware-opal-psr | 18 ++ .../bindings/powerpc/opal/sensor-groups.txt| 23 ++ arch/powerpc/include/asm/opal-api.h| 8 +- arch/powerpc/include/asm/opal.h| 9 + arch/powerpc/include/uapi/asm/opal-occ.h | 23 ++ arch/powerpc/platforms/powernv/Makefile| 2 +- arch/powerpc/platforms/powernv/opal-occ.c | 116 ++ arch/powerpc/platforms/powernv/opal-powercap.c | 244 + arch/powerpc/platforms/powernv/opal-psr.c | 175 +++ arch/powerpc/platforms/powernv/opal-wrappers.S | 5 + arch/powerpc/platforms/powernv/opal.c | 10 + 12 files changed, 662 insertions(+), 2 deletions(-) create mode 100644 Documentation/ABI/testing/sysfs-firmware-opal-powercap create mode 100644 Documentation/ABI/testing/sysfs-firmware-opal-psr create mode 100644 Documentation/devicetree/bindings/powerpc/opal/sensor-groups.txt create mode 100644 arch/powerpc/include/uapi/asm/opal-occ.h create mode 100644 arch/powerpc/platforms/powernv/opal-occ.c create mode 100644 arch/powerpc/platforms/powernv/opal-powercap.c create mode 100644 arch/powerpc/platforms/powernv/opal-psr.c -- 1.8.3.1
[RESEND][PATCH V10 3/3] powernv: Add support to clear sensor groups data
Adds support for clearing different sensor groups. OCC inband sensor groups like CSM, Profiler, Job Scheduler can be cleared using this driver. The min/max of all sensors belonging to these sensor groups will be cleared. Signed-off-by: Shilpasri G Bhat--- .../bindings/powerpc/opal/sensor-groups.txt| 23 arch/powerpc/include/asm/opal-api.h| 3 +- arch/powerpc/include/asm/opal.h| 2 + arch/powerpc/include/uapi/asm/opal-occ.h | 23 arch/powerpc/platforms/powernv/Makefile| 2 +- arch/powerpc/platforms/powernv/opal-occ.c | 116 + arch/powerpc/platforms/powernv/opal-wrappers.S | 1 + arch/powerpc/platforms/powernv/opal.c | 3 + 8 files changed, 171 insertions(+), 2 deletions(-) create mode 100644 Documentation/devicetree/bindings/powerpc/opal/sensor-groups.txt create mode 100644 arch/powerpc/include/uapi/asm/opal-occ.h create mode 100644 arch/powerpc/platforms/powernv/opal-occ.c diff --git a/Documentation/devicetree/bindings/powerpc/opal/sensor-groups.txt b/Documentation/devicetree/bindings/powerpc/opal/sensor-groups.txt new file mode 100644 index 000..304b87c --- /dev/null +++ b/Documentation/devicetree/bindings/powerpc/opal/sensor-groups.txt @@ -0,0 +1,23 @@ +IBM OPAL Sensor Groups Binding +--- + +Node: /ibm,opal/sensor-groups + +Description: Contains sensor groups available in the Powernv P9 +servers. Each child node indicates a sensor group. + +- compatible : Should be "ibm,opal-occ-sensor-group" + +Each child node contains below properties: + +- type : String to indicate the type of sensor-group + +- sensor-group-id: Abstract unique identifier provided by firmware of + type which is used for sensor-group + operations like clearing the min/max history of all + sensors belonging to the group. + +- ibm,chip-id : Chip ID + +- sensors : Phandle array of child nodes of /ibm,opal/sensor/ + belonging to this group diff --git a/arch/powerpc/include/asm/opal-api.h b/arch/powerpc/include/asm/opal-api.h index 92e31fd..0841659 100644 --- a/arch/powerpc/include/asm/opal-api.h +++ b/arch/powerpc/include/asm/opal-api.h @@ -195,7 +195,8 @@ #define OPAL_SET_POWERCAP 153 #define OPAL_GET_POWER_SHIFT_RATIO 154 #define OPAL_SET_POWER_SHIFT_RATIO 155 -#define OPAL_LAST 155 +#define OPAL_SENSOR_GROUPS_CLEAR 156 +#define OPAL_LAST 156 /* Device tree flags */ diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h index b9ea77f..a716def 100644 --- a/arch/powerpc/include/asm/opal.h +++ b/arch/powerpc/include/asm/opal.h @@ -271,6 +271,7 @@ int64_t opal_xive_set_vp_info(uint64_t vp, int opal_set_powercap(u32 handle, int token, u32 pcap); int opal_get_power_shift_ratio(u32 handle, int token, u32 *psr); int opal_set_power_shift_ratio(u32 handle, int token, u32 psr); +int opal_sensor_groups_clear(u32 group_hndl, int token); /* Internal functions */ extern int early_init_dt_scan_opal(unsigned long node, const char *uname, @@ -351,6 +352,7 @@ static inline int opal_get_async_rc(struct opal_msg msg) void opal_powercap_init(void); void opal_psr_init(void); +int opal_sensor_groups_clear_history(u32 handle); #endif /* __ASSEMBLY__ */ diff --git a/arch/powerpc/include/uapi/asm/opal-occ.h b/arch/powerpc/include/uapi/asm/opal-occ.h new file mode 100644 index 000..97c45e2 --- /dev/null +++ b/arch/powerpc/include/uapi/asm/opal-occ.h @@ -0,0 +1,23 @@ +/* + * OPAL OCC command interface + * Supported on POWERNV platform + * + * (C) Copyright IBM 2017 + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2, or (at your option) + * any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + */ + +#ifndef _UAPI_ASM_POWERPC_OPAL_OCC_H_ +#define _UAPI_ASM_POWERPC_OPAL_OCC_H_ + +#define OPAL_OCC_IOCTL_CLEAR_SENSOR_GROUPS _IOR('o', 1, u32) + +#endif /* _UAPI_ASM_POWERPC_OPAL_OCC_H */ diff --git a/arch/powerpc/platforms/powernv/Makefile b/arch/powerpc/platforms/powernv/Makefile index 9ed7d33..f193b33 100644 --- a/arch/powerpc/platforms/powernv/Makefile +++ b/arch/powerpc/platforms/powernv/Makefile @@ -2,7 +2,7 @@ obj-y += setup.o opal-wrappers.o opal.o opal-async.o idle.o obj-y += opal-rtc.o opal-nvram.o opal-lpc.o opal-flash.o obj-y += rng.o opal-elog.o opal-dump.o opal-sysparam.o opal-sensor.o obj-y
[RESEND][PATCH V10 2/3] powernv: Add support to set power-shifting-ratio
This patch adds support to set power-shifting-ratio which hints the firmware how to distribute/throttle power between different entities in a system (e.g CPU v/s GPU). This ratio is used by OCC for power capping algorithm. Signed-off-by: Shilpasri G Bhat--- Documentation/ABI/testing/sysfs-firmware-opal-psr | 18 +++ arch/powerpc/include/asm/opal-api.h | 4 +- arch/powerpc/include/asm/opal.h | 3 + arch/powerpc/platforms/powernv/Makefile | 2 +- arch/powerpc/platforms/powernv/opal-psr.c | 175 ++ arch/powerpc/platforms/powernv/opal-wrappers.S| 2 + arch/powerpc/platforms/powernv/opal.c | 3 + 7 files changed, 205 insertions(+), 2 deletions(-) create mode 100644 Documentation/ABI/testing/sysfs-firmware-opal-psr create mode 100644 arch/powerpc/platforms/powernv/opal-psr.c diff --git a/Documentation/ABI/testing/sysfs-firmware-opal-psr b/Documentation/ABI/testing/sysfs-firmware-opal-psr new file mode 100644 index 000..cc2ece7 --- /dev/null +++ b/Documentation/ABI/testing/sysfs-firmware-opal-psr @@ -0,0 +1,18 @@ +What: /sys/firmware/opal/psr +Date: August 2017 +Contact: Linux for PowerPC mailing list +Description: Power-Shift-Ratio directory for Powernv P9 servers + + Power-Shift-Ratio allows to provide hints the firmware + to shift/throttle power between different entities in + the system. Each attribute in this directory indicates + a settable PSR. + +What: /sys/firmware/opal/psr/cpu_to_gpu_X +Date: August 2017 +Contact: Linux for PowerPC mailing list +Description: PSR sysfs attributes for Powernv P9 servers + + Power-Shift-Ratio between CPU and GPU for a given chip + with chip-id X. This file gives the ratio (0-100) + which is used by OCC for power-capping. diff --git a/arch/powerpc/include/asm/opal-api.h b/arch/powerpc/include/asm/opal-api.h index c3e0c4a..92e31fd 100644 --- a/arch/powerpc/include/asm/opal-api.h +++ b/arch/powerpc/include/asm/opal-api.h @@ -193,7 +193,9 @@ #define OPAL_NPU_MAP_LPAR 148 #define OPAL_GET_POWERCAP 152 #define OPAL_SET_POWERCAP 153 -#define OPAL_LAST 153 +#define OPAL_GET_POWER_SHIFT_RATIO 154 +#define OPAL_SET_POWER_SHIFT_RATIO 155 +#define OPAL_LAST 155 /* Device tree flags */ diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h index ec2087c..b9ea77f 100644 --- a/arch/powerpc/include/asm/opal.h +++ b/arch/powerpc/include/asm/opal.h @@ -269,6 +269,8 @@ int64_t opal_xive_set_vp_info(uint64_t vp, int64_t opal_xive_dump(uint32_t type, uint32_t id); int opal_get_powercap(u32 handle, int token, u32 *pcap); int opal_set_powercap(u32 handle, int token, u32 pcap); +int opal_get_power_shift_ratio(u32 handle, int token, u32 *psr); +int opal_set_power_shift_ratio(u32 handle, int token, u32 psr); /* Internal functions */ extern int early_init_dt_scan_opal(unsigned long node, const char *uname, @@ -348,6 +350,7 @@ static inline int opal_get_async_rc(struct opal_msg msg) void opal_wake_poller(void); void opal_powercap_init(void); +void opal_psr_init(void); #endif /* __ASSEMBLY__ */ diff --git a/arch/powerpc/platforms/powernv/Makefile b/arch/powerpc/platforms/powernv/Makefile index e79f806..9ed7d33 100644 --- a/arch/powerpc/platforms/powernv/Makefile +++ b/arch/powerpc/platforms/powernv/Makefile @@ -2,7 +2,7 @@ obj-y += setup.o opal-wrappers.o opal.o opal-async.o idle.o obj-y += opal-rtc.o opal-nvram.o opal-lpc.o opal-flash.o obj-y += rng.o opal-elog.o opal-dump.o opal-sysparam.o opal-sensor.o obj-y += opal-msglog.o opal-hmi.o opal-power.o opal-irqchip.o -obj-y += opal-kmsg.o opal-powercap.o +obj-y += opal-kmsg.o opal-powercap.o opal-psr.o obj-$(CONFIG_SMP) += smp.o subcore.o subcore-asm.o obj-$(CONFIG_PCI) += pci.o pci-ioda.o npu-dma.o diff --git a/arch/powerpc/platforms/powernv/opal-psr.c b/arch/powerpc/platforms/powernv/opal-psr.c new file mode 100644 index 000..7313b7f --- /dev/null +++ b/arch/powerpc/platforms/powernv/opal-psr.c @@ -0,0 +1,175 @@ +/* + * PowerNV OPAL Power-Shift-Ratio interface + * + * Copyright 2017 IBM Corp. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ + +#define pr_fmt(fmt) "opal-psr: " fmt + +#include +#include +#include + +#include + +DEFINE_MUTEX(psr_mutex); + +static struct kobject
[RESEND][PATCH V10 1/3] powernv: powercap: Add support for powercap framework
Adds a generic powercap framework to change the system powercap inband through OPAL-OCC command/response interface. Signed-off-by: Shilpasri G Bhat--- .../ABI/testing/sysfs-firmware-opal-powercap | 31 +++ arch/powerpc/include/asm/opal-api.h| 5 +- arch/powerpc/include/asm/opal.h| 4 + arch/powerpc/platforms/powernv/Makefile| 2 +- arch/powerpc/platforms/powernv/opal-powercap.c | 244 + arch/powerpc/platforms/powernv/opal-wrappers.S | 2 + arch/powerpc/platforms/powernv/opal.c | 4 + 7 files changed, 290 insertions(+), 2 deletions(-) create mode 100644 Documentation/ABI/testing/sysfs-firmware-opal-powercap create mode 100644 arch/powerpc/platforms/powernv/opal-powercap.c diff --git a/Documentation/ABI/testing/sysfs-firmware-opal-powercap b/Documentation/ABI/testing/sysfs-firmware-opal-powercap new file mode 100644 index 000..c9b66ec --- /dev/null +++ b/Documentation/ABI/testing/sysfs-firmware-opal-powercap @@ -0,0 +1,31 @@ +What: /sys/firmware/opal/powercap +Date: August 2017 +Contact: Linux for PowerPC mailing list +Description: Powercap directory for Powernv (P8, P9) servers + + Each folder in this directory contains a + power-cappable component. + +What: /sys/firmware/opal/powercap/system-powercap + /sys/firmware/opal/powercap/system-powercap/powercap-min + /sys/firmware/opal/powercap/system-powercap/powercap-max + /sys/firmware/opal/powercap/system-powercap/powercap-current +Date: August 2017 +Contact: Linux for PowerPC mailing list +Description: System powercap directory and attributes applicable for + Powernv (P8, P9) servers + + This directory provides powercap information. It + contains below sysfs attributes: + + - powercap-min : This file provides the minimum + possible powercap in Watt units + + - powercap-max : This file provides the maximum + possible powercap in Watt units + + - powercap-current : This file provides the current + powercap set on the system. Writing to this file + creates a request for setting a new-powercap. The + powercap requested must be between powercap-min + and powercap-max. diff --git a/arch/powerpc/include/asm/opal-api.h b/arch/powerpc/include/asm/opal-api.h index 3130a73..c3e0c4a 100644 --- a/arch/powerpc/include/asm/opal-api.h +++ b/arch/powerpc/include/asm/opal-api.h @@ -42,6 +42,7 @@ #define OPAL_I2C_STOP_ERR -24 #define OPAL_XIVE_PROVISIONING -31 #define OPAL_XIVE_FREE_ACTIVE -32 +#define OPAL_TIMEOUT -33 /* API Tokens (in r0) */ #define OPAL_INVALID_CALL -1 @@ -190,7 +191,9 @@ #define OPAL_NPU_INIT_CONTEXT 146 #define OPAL_NPU_DESTROY_CONTEXT 147 #define OPAL_NPU_MAP_LPAR 148 -#define OPAL_LAST 148 +#define OPAL_GET_POWERCAP 152 +#define OPAL_SET_POWERCAP 153 +#define OPAL_LAST 153 /* Device tree flags */ diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h index 588fb1c..ec2087c 100644 --- a/arch/powerpc/include/asm/opal.h +++ b/arch/powerpc/include/asm/opal.h @@ -267,6 +267,8 @@ int64_t opal_xive_set_vp_info(uint64_t vp, int64_t opal_xive_free_irq(uint32_t girq); int64_t opal_xive_sync(uint32_t type, uint32_t id); int64_t opal_xive_dump(uint32_t type, uint32_t id); +int opal_get_powercap(u32 handle, int token, u32 *pcap); +int opal_set_powercap(u32 handle, int token, u32 pcap); /* Internal functions */ extern int early_init_dt_scan_opal(unsigned long node, const char *uname, @@ -345,6 +347,8 @@ static inline int opal_get_async_rc(struct opal_msg msg) void opal_wake_poller(void); +void opal_powercap_init(void); + #endif /* __ASSEMBLY__ */ #endif /* _ASM_POWERPC_OPAL_H */ diff --git a/arch/powerpc/platforms/powernv/Makefile b/arch/powerpc/platforms/powernv/Makefile index b5d98cb..e79f806 100644 --- a/arch/powerpc/platforms/powernv/Makefile +++ b/arch/powerpc/platforms/powernv/Makefile @@ -2,7 +2,7 @@ obj-y += setup.o opal-wrappers.o opal.o opal-async.o idle.o obj-y += opal-rtc.o opal-nvram.o opal-lpc.o opal-flash.o obj-y += rng.o opal-elog.o opal-dump.o opal-sysparam.o opal-sensor.o obj-y += opal-msglog.o opal-hmi.o opal-power.o opal-irqchip.o -obj-y += opal-kmsg.o +obj-y += opal-kmsg.o opal-powercap.o obj-$(CONFIG_SMP) += smp.o subcore.o subcore-asm.o obj-$(CONFIG_PCI) += pci.o pci-ioda.o npu-dma.o diff
[PATCH V9 1/3] powernv: powercap: Add support for powercap framework
Adds a generic powercap framework to change the system powercap inband through OPAL-OCC command/response interface. Signed-off-by: Shilpasri G Bhat--- .../ABI/testing/sysfs-firmware-opal-powercap | 31 +++ arch/powerpc/include/asm/opal-api.h| 5 +- arch/powerpc/include/asm/opal.h| 4 + arch/powerpc/platforms/powernv/Makefile| 2 +- arch/powerpc/platforms/powernv/opal-powercap.c | 244 + arch/powerpc/platforms/powernv/opal-wrappers.S | 2 + arch/powerpc/platforms/powernv/opal.c | 4 + 7 files changed, 290 insertions(+), 2 deletions(-) create mode 100644 Documentation/ABI/testing/sysfs-firmware-opal-powercap create mode 100644 arch/powerpc/platforms/powernv/opal-powercap.c diff --git a/Documentation/ABI/testing/sysfs-firmware-opal-powercap b/Documentation/ABI/testing/sysfs-firmware-opal-powercap new file mode 100644 index 000..c9b66ec --- /dev/null +++ b/Documentation/ABI/testing/sysfs-firmware-opal-powercap @@ -0,0 +1,31 @@ +What: /sys/firmware/opal/powercap +Date: August 2017 +Contact: Linux for PowerPC mailing list +Description: Powercap directory for Powernv (P8, P9) servers + + Each folder in this directory contains a + power-cappable component. + +What: /sys/firmware/opal/powercap/system-powercap + /sys/firmware/opal/powercap/system-powercap/powercap-min + /sys/firmware/opal/powercap/system-powercap/powercap-max + /sys/firmware/opal/powercap/system-powercap/powercap-current +Date: August 2017 +Contact: Linux for PowerPC mailing list +Description: System powercap directory and attributes applicable for + Powernv (P8, P9) servers + + This directory provides powercap information. It + contains below sysfs attributes: + + - powercap-min : This file provides the minimum + possible powercap in Watt units + + - powercap-max : This file provides the maximum + possible powercap in Watt units + + - powercap-current : This file provides the current + powercap set on the system. Writing to this file + creates a request for setting a new-powercap. The + powercap requested must be between powercap-min + and powercap-max. diff --git a/arch/powerpc/include/asm/opal-api.h b/arch/powerpc/include/asm/opal-api.h index 3130a73..c3e0c4a 100644 --- a/arch/powerpc/include/asm/opal-api.h +++ b/arch/powerpc/include/asm/opal-api.h @@ -42,6 +42,7 @@ #define OPAL_I2C_STOP_ERR -24 #define OPAL_XIVE_PROVISIONING -31 #define OPAL_XIVE_FREE_ACTIVE -32 +#define OPAL_TIMEOUT -33 /* API Tokens (in r0) */ #define OPAL_INVALID_CALL -1 @@ -190,7 +191,9 @@ #define OPAL_NPU_INIT_CONTEXT 146 #define OPAL_NPU_DESTROY_CONTEXT 147 #define OPAL_NPU_MAP_LPAR 148 -#define OPAL_LAST 148 +#define OPAL_GET_POWERCAP 152 +#define OPAL_SET_POWERCAP 153 +#define OPAL_LAST 153 /* Device tree flags */ diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h index 588fb1c..ec2087c 100644 --- a/arch/powerpc/include/asm/opal.h +++ b/arch/powerpc/include/asm/opal.h @@ -267,6 +267,8 @@ int64_t opal_xive_set_vp_info(uint64_t vp, int64_t opal_xive_free_irq(uint32_t girq); int64_t opal_xive_sync(uint32_t type, uint32_t id); int64_t opal_xive_dump(uint32_t type, uint32_t id); +int opal_get_powercap(u32 handle, int token, u32 *pcap); +int opal_set_powercap(u32 handle, int token, u32 pcap); /* Internal functions */ extern int early_init_dt_scan_opal(unsigned long node, const char *uname, @@ -345,6 +347,8 @@ static inline int opal_get_async_rc(struct opal_msg msg) void opal_wake_poller(void); +void opal_powercap_init(void); + #endif /* __ASSEMBLY__ */ #endif /* _ASM_POWERPC_OPAL_H */ diff --git a/arch/powerpc/platforms/powernv/Makefile b/arch/powerpc/platforms/powernv/Makefile index b5d98cb..e79f806 100644 --- a/arch/powerpc/platforms/powernv/Makefile +++ b/arch/powerpc/platforms/powernv/Makefile @@ -2,7 +2,7 @@ obj-y += setup.o opal-wrappers.o opal.o opal-async.o idle.o obj-y += opal-rtc.o opal-nvram.o opal-lpc.o opal-flash.o obj-y += rng.o opal-elog.o opal-dump.o opal-sysparam.o opal-sensor.o obj-y += opal-msglog.o opal-hmi.o opal-power.o opal-irqchip.o -obj-y += opal-kmsg.o +obj-y += opal-kmsg.o opal-powercap.o obj-$(CONFIG_SMP) += smp.o subcore.o subcore-asm.o obj-$(CONFIG_PCI) += pci.o pci-ioda.o npu-dma.o diff
[PATCH V9 3/3] powernv: Add support to clear sensor groups data
Adds support for clearing different sensor groups. OCC inband sensor groups like CSM, Profiler, Job Scheduler can be cleared using this driver. The min/max of all sensors belonging to these sensor groups will be cleared. Signed-off-by: Shilpasri G Bhat--- .../bindings/powerpc/opal/sensor-groups.txt| 23 arch/powerpc/include/asm/opal-api.h| 3 +- arch/powerpc/include/asm/opal.h| 2 + arch/powerpc/include/uapi/asm/opal-occ.h | 23 arch/powerpc/platforms/powernv/Makefile| 2 +- arch/powerpc/platforms/powernv/opal-occ.c | 116 + arch/powerpc/platforms/powernv/opal-wrappers.S | 1 + arch/powerpc/platforms/powernv/opal.c | 3 + 8 files changed, 171 insertions(+), 2 deletions(-) create mode 100644 Documentation/devicetree/bindings/powerpc/opal/sensor-groups.txt create mode 100644 arch/powerpc/include/uapi/asm/opal-occ.h create mode 100644 arch/powerpc/platforms/powernv/opal-occ.c diff --git a/Documentation/devicetree/bindings/powerpc/opal/sensor-groups.txt b/Documentation/devicetree/bindings/powerpc/opal/sensor-groups.txt new file mode 100644 index 000..304b87c --- /dev/null +++ b/Documentation/devicetree/bindings/powerpc/opal/sensor-groups.txt @@ -0,0 +1,23 @@ +IBM OPAL Sensor Groups Binding +--- + +Node: /ibm,opal/sensor-groups + +Description: Contains sensor groups available in the Powernv P9 +servers. Each child node indicates a sensor group. + +- compatible : Should be "ibm,opal-occ-sensor-group" + +Each child node contains below properties: + +- type : String to indicate the type of sensor-group + +- sensor-group-id: Abstract unique identifier provided by firmware of + type which is used for sensor-group + operations like clearing the min/max history of all + sensors belonging to the group. + +- ibm,chip-id : Chip ID + +- sensors : Phandle array of child nodes of /ibm,opal/sensor/ + belonging to this group diff --git a/arch/powerpc/include/asm/opal-api.h b/arch/powerpc/include/asm/opal-api.h index 92e31fd..0841659 100644 --- a/arch/powerpc/include/asm/opal-api.h +++ b/arch/powerpc/include/asm/opal-api.h @@ -195,7 +195,8 @@ #define OPAL_SET_POWERCAP 153 #define OPAL_GET_POWER_SHIFT_RATIO 154 #define OPAL_SET_POWER_SHIFT_RATIO 155 -#define OPAL_LAST 155 +#define OPAL_SENSOR_GROUPS_CLEAR 156 +#define OPAL_LAST 156 /* Device tree flags */ diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h index b9ea77f..a716def 100644 --- a/arch/powerpc/include/asm/opal.h +++ b/arch/powerpc/include/asm/opal.h @@ -271,6 +271,7 @@ int64_t opal_xive_set_vp_info(uint64_t vp, int opal_set_powercap(u32 handle, int token, u32 pcap); int opal_get_power_shift_ratio(u32 handle, int token, u32 *psr); int opal_set_power_shift_ratio(u32 handle, int token, u32 psr); +int opal_sensor_groups_clear(u32 group_hndl, int token); /* Internal functions */ extern int early_init_dt_scan_opal(unsigned long node, const char *uname, @@ -351,6 +352,7 @@ static inline int opal_get_async_rc(struct opal_msg msg) void opal_powercap_init(void); void opal_psr_init(void); +int opal_sensor_groups_clear_history(u32 handle); #endif /* __ASSEMBLY__ */ diff --git a/arch/powerpc/include/uapi/asm/opal-occ.h b/arch/powerpc/include/uapi/asm/opal-occ.h new file mode 100644 index 000..97c45e2 --- /dev/null +++ b/arch/powerpc/include/uapi/asm/opal-occ.h @@ -0,0 +1,23 @@ +/* + * OPAL OCC command interface + * Supported on POWERNV platform + * + * (C) Copyright IBM 2017 + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2, or (at your option) + * any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + */ + +#ifndef _UAPI_ASM_POWERPC_OPAL_OCC_H_ +#define _UAPI_ASM_POWERPC_OPAL_OCC_H_ + +#define OPAL_OCC_IOCTL_CLEAR_SENSOR_GROUPS _IOR('o', 1, u32) + +#endif /* _UAPI_ASM_POWERPC_OPAL_OCC_H */ diff --git a/arch/powerpc/platforms/powernv/Makefile b/arch/powerpc/platforms/powernv/Makefile index 9ed7d33..f193b33 100644 --- a/arch/powerpc/platforms/powernv/Makefile +++ b/arch/powerpc/platforms/powernv/Makefile @@ -2,7 +2,7 @@ obj-y += setup.o opal-wrappers.o opal.o opal-async.o idle.o obj-y += opal-rtc.o opal-nvram.o opal-lpc.o opal-flash.o obj-y += rng.o opal-elog.o opal-dump.o opal-sysparam.o opal-sensor.o obj-y
[PATCH V9 2/3] powernv: Add support to set power-shifting-ratio
This patch adds support to set power-shifting-ratio which hints the firmware how to distribute/throttle power between different entities in a system (e.g CPU v/s GPU). This ratio is used by OCC for power capping algorithm. Signed-off-by: Shilpasri G Bhat--- Documentation/ABI/testing/sysfs-firmware-opal-psr | 18 +++ arch/powerpc/include/asm/opal-api.h | 4 +- arch/powerpc/include/asm/opal.h | 3 + arch/powerpc/platforms/powernv/Makefile | 2 +- arch/powerpc/platforms/powernv/opal-psr.c | 175 ++ arch/powerpc/platforms/powernv/opal-wrappers.S| 2 + arch/powerpc/platforms/powernv/opal.c | 3 + 7 files changed, 205 insertions(+), 2 deletions(-) create mode 100644 Documentation/ABI/testing/sysfs-firmware-opal-psr create mode 100644 arch/powerpc/platforms/powernv/opal-psr.c diff --git a/Documentation/ABI/testing/sysfs-firmware-opal-psr b/Documentation/ABI/testing/sysfs-firmware-opal-psr new file mode 100644 index 000..cc2ece7 --- /dev/null +++ b/Documentation/ABI/testing/sysfs-firmware-opal-psr @@ -0,0 +1,18 @@ +What: /sys/firmware/opal/psr +Date: August 2017 +Contact: Linux for PowerPC mailing list +Description: Power-Shift-Ratio directory for Powernv P9 servers + + Power-Shift-Ratio allows to provide hints the firmware + to shift/throttle power between different entities in + the system. Each attribute in this directory indicates + a settable PSR. + +What: /sys/firmware/opal/psr/cpu_to_gpu_X +Date: August 2017 +Contact: Linux for PowerPC mailing list +Description: PSR sysfs attributes for Powernv P9 servers + + Power-Shift-Ratio between CPU and GPU for a given chip + with chip-id X. This file gives the ratio (0-100) + which is used by OCC for power-capping. diff --git a/arch/powerpc/include/asm/opal-api.h b/arch/powerpc/include/asm/opal-api.h index c3e0c4a..92e31fd 100644 --- a/arch/powerpc/include/asm/opal-api.h +++ b/arch/powerpc/include/asm/opal-api.h @@ -193,7 +193,9 @@ #define OPAL_NPU_MAP_LPAR 148 #define OPAL_GET_POWERCAP 152 #define OPAL_SET_POWERCAP 153 -#define OPAL_LAST 153 +#define OPAL_GET_POWER_SHIFT_RATIO 154 +#define OPAL_SET_POWER_SHIFT_RATIO 155 +#define OPAL_LAST 155 /* Device tree flags */ diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h index ec2087c..b9ea77f 100644 --- a/arch/powerpc/include/asm/opal.h +++ b/arch/powerpc/include/asm/opal.h @@ -269,6 +269,8 @@ int64_t opal_xive_set_vp_info(uint64_t vp, int64_t opal_xive_dump(uint32_t type, uint32_t id); int opal_get_powercap(u32 handle, int token, u32 *pcap); int opal_set_powercap(u32 handle, int token, u32 pcap); +int opal_get_power_shift_ratio(u32 handle, int token, u32 *psr); +int opal_set_power_shift_ratio(u32 handle, int token, u32 psr); /* Internal functions */ extern int early_init_dt_scan_opal(unsigned long node, const char *uname, @@ -348,6 +350,7 @@ static inline int opal_get_async_rc(struct opal_msg msg) void opal_wake_poller(void); void opal_powercap_init(void); +void opal_psr_init(void); #endif /* __ASSEMBLY__ */ diff --git a/arch/powerpc/platforms/powernv/Makefile b/arch/powerpc/platforms/powernv/Makefile index e79f806..9ed7d33 100644 --- a/arch/powerpc/platforms/powernv/Makefile +++ b/arch/powerpc/platforms/powernv/Makefile @@ -2,7 +2,7 @@ obj-y += setup.o opal-wrappers.o opal.o opal-async.o idle.o obj-y += opal-rtc.o opal-nvram.o opal-lpc.o opal-flash.o obj-y += rng.o opal-elog.o opal-dump.o opal-sysparam.o opal-sensor.o obj-y += opal-msglog.o opal-hmi.o opal-power.o opal-irqchip.o -obj-y += opal-kmsg.o opal-powercap.o +obj-y += opal-kmsg.o opal-powercap.o opal-psr.o obj-$(CONFIG_SMP) += smp.o subcore.o subcore-asm.o obj-$(CONFIG_PCI) += pci.o pci-ioda.o npu-dma.o diff --git a/arch/powerpc/platforms/powernv/opal-psr.c b/arch/powerpc/platforms/powernv/opal-psr.c new file mode 100644 index 000..7313b7f --- /dev/null +++ b/arch/powerpc/platforms/powernv/opal-psr.c @@ -0,0 +1,175 @@ +/* + * PowerNV OPAL Power-Shift-Ratio interface + * + * Copyright 2017 IBM Corp. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ + +#define pr_fmt(fmt) "opal-psr: " fmt + +#include +#include +#include + +#include + +DEFINE_MUTEX(psr_mutex); + +static struct kobject
[PATCH V9 0/3] powernv : Add support for OPAL-OCC command/response interface
In P9, OCC (On-Chip-Controller) supports shared memory based commad-response interface. Within the shared memory there is an OPAL command buffer and OCC response buffer that can be used to send inband commands to OCC. The following commands are supported: 1) Set system powercap 2) Set CPU-GPU power shifting ratio 3) Clear min/max for OCC sensor groups Changes from V9: - Fixed return after erroring from mutex_lock_interruptible() - Added documentation Shilpasri G Bhat (3): powernv: powercap: Add support for powercap framework powernv: Add support to set power-shifting-ratio powernv: Add support to clear sensor groups data .../ABI/testing/sysfs-firmware-opal-powercap | 31 +++ Documentation/ABI/testing/sysfs-firmware-opal-psr | 18 ++ .../bindings/powerpc/opal/sensor-groups.txt| 23 ++ arch/powerpc/include/asm/opal-api.h| 8 +- arch/powerpc/include/asm/opal.h| 9 + arch/powerpc/include/uapi/asm/opal-occ.h | 23 ++ arch/powerpc/platforms/powernv/Makefile| 2 +- arch/powerpc/platforms/powernv/opal-occ.c | 116 ++ arch/powerpc/platforms/powernv/opal-powercap.c | 244 + arch/powerpc/platforms/powernv/opal-psr.c | 175 +++ arch/powerpc/platforms/powernv/opal-wrappers.S | 5 + arch/powerpc/platforms/powernv/opal.c | 10 + 12 files changed, 662 insertions(+), 2 deletions(-) create mode 100644 Documentation/ABI/testing/sysfs-firmware-opal-powercap create mode 100644 Documentation/ABI/testing/sysfs-firmware-opal-psr create mode 100644 Documentation/devicetree/bindings/powerpc/opal/sensor-groups.txt create mode 100644 arch/powerpc/include/uapi/asm/opal-occ.h create mode 100644 arch/powerpc/platforms/powernv/opal-occ.c create mode 100644 arch/powerpc/platforms/powernv/opal-powercap.c create mode 100644 arch/powerpc/platforms/powernv/opal-psr.c -- 1.8.3.1
Re: [PATCH] fs: convert a pile of fsync routines to errseq_t based reporting
On Fri 28-07-17 10:23:21, Jeff Layton wrote: > From: Jeff Layton> > This patch converts most of the in-kernel filesystems that do writeback > out of the pagecache to report errors using the errseq_t-based > infrastructure that was recently added. This allows them to report > errors once for each open file description. > > Most filesystems have a fairly straightforward fsync operation. They > call filemap_write_and_wait_range to write back all of the data and > wait on it, and then (sometimes) sync out the metadata. > > For those filesystems this is a straightforward conversion from calling > filemap_write_and_wait_range in their fsync operation to calling > file_write_and_wait_range. > > Signed-off-by: Jeff Layton This all looks rather obvious. Feel free to add: Acked-by: Jan Kara Honza > --- > arch/powerpc/platforms/cell/spufs/file.c | 2 +- > drivers/staging/lustre/lustre/llite/file.c | 2 +- > drivers/video/fbdev/core/fb_defio.c| 2 +- > fs/9p/vfs_file.c | 4 ++-- > fs/affs/file.c | 2 +- > fs/afs/write.c | 2 +- > fs/cifs/file.c | 4 ++-- > fs/exofs/file.c| 2 +- > fs/f2fs/file.c | 2 +- > fs/hfs/inode.c | 2 +- > fs/hfsplus/inode.c | 2 +- > fs/hostfs/hostfs_kern.c| 2 +- > fs/hpfs/file.c | 2 +- > fs/jffs2/file.c| 2 +- > fs/jfs/file.c | 2 +- > fs/ncpfs/file.c| 2 +- > fs/ntfs/dir.c | 2 +- > fs/ntfs/file.c | 2 +- > fs/ocfs2/file.c| 2 +- > fs/reiserfs/dir.c | 2 +- > fs/reiserfs/file.c | 2 +- > fs/ubifs/file.c| 2 +- > 22 files changed, 24 insertions(+), 24 deletions(-) > > Rolling up all of these conversions into a single patch, as Christoph > Hellwig suggested. Most of these are not tested, but the conversion > here is fairly straightforward. > > Any maintainers who object, please let me know and I'll yank that > part out of this patch. > > diff --git a/arch/powerpc/platforms/cell/spufs/file.c > b/arch/powerpc/platforms/cell/spufs/file.c > index ae2f740a82f1..5ffcdeb1eb17 100644 > --- a/arch/powerpc/platforms/cell/spufs/file.c > +++ b/arch/powerpc/platforms/cell/spufs/file.c > @@ -1749,7 +1749,7 @@ static int spufs_mfc_flush(struct file *file, > fl_owner_t id) > static int spufs_mfc_fsync(struct file *file, loff_t start, loff_t end, int > datasync) > { > struct inode *inode = file_inode(file); > - int err = filemap_write_and_wait_range(inode->i_mapping, start, end); > + int err = file_write_and_wait_range(file, start, end); > if (!err) { > inode_lock(inode); > err = spufs_mfc_flush(file, NULL); > diff --git a/drivers/staging/lustre/lustre/llite/file.c > b/drivers/staging/lustre/lustre/llite/file.c > index ab1c85c1ed38..f7d07735ac66 100644 > --- a/drivers/staging/lustre/lustre/llite/file.c > +++ b/drivers/staging/lustre/lustre/llite/file.c > @@ -2364,7 +2364,7 @@ int ll_fsync(struct file *file, loff_t start, loff_t > end, int datasync) > PFID(ll_inode2fid(inode)), inode); > ll_stats_ops_tally(ll_i2sbi(inode), LPROC_LL_FSYNC, 1); > > - rc = filemap_write_and_wait_range(inode->i_mapping, start, end); > + rc = file_write_and_wait_range(file, start, end); > inode_lock(inode); > > /* catch async errors that were recorded back when async writeback > diff --git a/drivers/video/fbdev/core/fb_defio.c > b/drivers/video/fbdev/core/fb_defio.c > index 37f69c061210..487d5e336e1b 100644 > --- a/drivers/video/fbdev/core/fb_defio.c > +++ b/drivers/video/fbdev/core/fb_defio.c > @@ -69,7 +69,7 @@ int fb_deferred_io_fsync(struct file *file, loff_t start, > loff_t end, int datasy > { > struct fb_info *info = file->private_data; > struct inode *inode = file_inode(file); > - int err = filemap_write_and_wait_range(inode->i_mapping, start, end); > + int err = file_write_and_wait_range(file, start, end); > if (err) > return err; > > diff --git a/fs/9p/vfs_file.c b/fs/9p/vfs_file.c > index 3de3b4a89d89..4802d75b3cf7 100644 > --- a/fs/9p/vfs_file.c > +++ b/fs/9p/vfs_file.c > @@ -445,7 +445,7 @@ static int v9fs_file_fsync(struct file *filp, loff_t > start, loff_t end, > struct p9_wstat wstat; > int retval; > > - retval = filemap_write_and_wait_range(inode->i_mapping, start, end); > + retval = file_write_and_wait_range(filp, start, end); > if (retval) > return retval; > > @@ -468,7 +468,7 @@
Re: powerpc/smp: Call smp_ops->setup_cpu() directly on the boot CPU
On Thu, 2017-07-27 at 13:23:37 UTC, Michael Ellerman wrote: > In smp_cpus_done() we need to call smp_ops->setup_cpu() for the boot > CPU, which means it has to run *on* the boot CPU. > > In the past we ensured it ran on the boot CPU by changing the CPU > affinity mask of current directly. That was removed in commit > 6d11b87d55eb ("powerpc/smp: Replace open coded task affinity logic"), > and replaced with a work queue call. > > Unfortunately using a work queue leads to a lockdep warning, now that > the CPU hotplug lock is a regular semaphore: > > == > WARNING: possible circular locking dependency detected > ... > kworker/0:1/971 is trying to acquire lock: >(cpu_hotplug_lock.rw_sem){++}, at: [] > apply_workqueue_attrs+0x34/0xa0 > > but task is already holding lock: >(()){+.+.+.}, at: [] > process_one_work+0x25c/0x800 > ... >CPU0CPU1 > > lock(()); >lock(cpu_hotplug_lock.rw_sem); >lock(()); > lock(cpu_hotplug_lock.rw_sem); > > Although the deadlock can't happen in practice, because > smp_cpus_done() only runs in early boot before CPU hotplug is allowed, > lockdep can't tell that. > > Luckily in commit 8fb12156b8db ("init: Pin init task to the boot CPU, > initially") tglx changed the generic code to pin init to the boot CPU > to begin with. The unpinning of init from the boot CPU happens in > sched_init_smp(), which is called after smp_cpus_done(). > > So smp_cpus_done() is always called on the boot CPU, which means we > don't need the work queue call at all - and the lockdep warning goes > away. > > Signed-off-by: Michael EllermanApplied to powerpc fixes. https://git.kernel.org/powerpc/c/7b7622bb95eb587cbaa79608e47b83 cheers
Re: powerpc/boot: Fix 64-bit boot wrapper build with non-biarch compiler
On Wed, 2017-07-26 at 13:19:04 UTC, Michael Ellerman wrote: > Historically the boot wrapper was always built 32-bit big endian, even > for 64-bit kernels. That was because old firmwares didn't necessarily > support booting a 64-bit image. Because of that arch/powerpc/boot/Makefile > uses CROSS32CC for compilation. > > However when we added 64-bit little endian support, we also added > support for building the boot wrapper 64-bit. However we kept using > CROSS32CC, because in most cases it is just CC and everything works. > > However if the user doesn't specify CROSS32_COMPILE (which no one ever > does AFAIK), and CC is *not* biarch (32/64-bit capable), then CROSS32CC > becomes just "gcc". On native systems that is probably OK, but if we're > cross building it definitely isn't, leading to eg: > > gcc ... -m64 -mlittle-endian -mabi=elfv2 ... arch/powerpc/boot/cpm-serial.c > gcc: error: unrecognized argument in option â-mabi=elfv2â > gcc: error: unrecognized command line option â-mlittle-endianâ > make: *** [zImage] Error 2 > > To fix it, stop using CROSS32CC, because we may or may not be building > 32-bit. Instead setup a BOOTCC, which defaults to CC, and only use > CROSS32_COMPILE if it's set and we're building for 32-bit. > > Fixes: 147c05168fc8 ("powerpc/boot: Add support for 64bit little endian > wrapper") > Signed-off-by: Michael EllermanApplied to powerpc fixes. https://git.kernel.org/powerpc/c/65c5ec11c25eff6ba6e9b1cbfff014 cheers
Re: powerpc/powernv/pci: Return failure for some uses of dma_set_mask()
On Wed, 2017-07-26 at 05:26:40 UTC, Alistair Popple wrote: > Commit 8e3f1b1d8255 ("powerpc/powernv/pci: Enable 64-bit devices to access > >4GB DMA space") introduced the ability for PCI device drivers to request a > DMA mask between 64 and 32 bits and actually get a mask greater than > 32-bits. However currently if certain machine configuration dependent > conditions are not meet the code silently falls back to a 32-bit mask. > > This makes it hard for device drivers to detect which mask they actually > got. Instead we should return an error when the request could not be > fulfilled which allows drivers to either fallback or implement other > workarounds as documented in DMA-API-HOWTO.txt. > > Signed-off-by: Alistair Popple> Acked-by: Russell Currey Applied to powerpc fixes, thanks. https://git.kernel.org/powerpc/c/253fd51e2f533552ae35a0c661705d cheers
Re: powerpc/tm: fix TM SPRs in code dump file
On Wed, 2017-07-19 at 05:44:13 UTC, Gustavo Romero wrote: > Currently flush_tmregs_to_thread() does not update accordingly the thread > structures from live state before a core dump rendering wrong values of > THFAR, TFIAR, and TEXASR in core dump files. > > That commit fixes it by copying from live state to the appropriate thread > structures when it's necessary. > > Signed-off-by: Gustavo Romero> Reviewed-by: Cyril Bur Applied to powerpc fixes, thanks. https://git.kernel.org/powerpc/c/cd63f3cf1d59b7ad8419eba1cac8f9 cheers
Re: [PATCH 1/2] KVM: PPC: e500: fix some NULL dereferences on error
On Thu, Jul 13, 2017 at 10:38:29AM +0300, Dan Carpenter wrote: > There are some error paths in kvmppc_core_vcpu_create_e500() where we > forget to set the error code. It means that we return ERR_PTR(0) which > is NULL and it results in a NULL pointer dereference in the caller. > > Signed-off-by: Dan CarpenterAre these user-triggerable, and therefore needing to go into 4.13 and be back-ported to the stable trees? Or can they wait for 4.14? Paul.