Re: [PATCH v3 0/5] of: add of_property_alloc/free() and of_node_alloc()
Sorry for the lack of response, it's been a busy week. I will get to this soon. -Frank On 6/20/22 06:41, Clément Léger wrote: > In order to be able to create new nodes and properties dynamically from > drivers, add of_property_alloc/free() and of_node_alloc(). These > functions can be used to create new nodes and properties flagged with > OF_DYNAMIC and to free them. > > Some powerpc code was already doing such operations and thus, these > functions have been used to replace the manual creation of nodes and > properties. This code has been more than simply replaced to allow using > of_node_put() rather than a manual deletion of the properties. > Unfortunately, as I don't own a powerpc platform, it would need to be > tested. > > --- > > Changes in V3: > - Remove gfpflag attribute from of_node_alloc() and of_property_alloc(). > - Removed allocflags from __of_node_dup(). > - Rework powerpc code to only use of_node_put(). > - Fix properties free using of_node_property in OF unittests. > > Changes in V2: > - Remove of_node_free() > - Rework property allocation to allocate both property and value with > 1 allocation > - Rework node allocation to allocate name at the same time the node is > allocated > - Remove extern from definitions > - Remove of_property_alloc() value_len parameter and add more > explanation for the arguments > - Add a check in of_property_free to check OF_DYNAMIC flag > - Add a commit which constify the property argument of > of_property_check_flags() > > Clément Léger (5): > of: constify of_property_check_flags() prop argument > of: remove __of_node_dup() allocflags parameter > of: dynamic: add of_property_alloc() and of_property_free() > of: dynamic: add of_node_alloc() > powerpc/pseries: use of_property_alloc/free() and of_node_alloc() > > arch/powerpc/platforms/pseries/dlpar.c| 62 +--- > .../platforms/pseries/hotplug-memory.c| 21 +-- > arch/powerpc/platforms/pseries/reconfig.c | 123 ++-- > drivers/of/dynamic.c | 137 -- > drivers/of/of_private.h | 19 ++- > drivers/of/overlay.c | 2 +- > drivers/of/unittest.c | 24 ++- > include/linux/of.h| 24 ++- > 8 files changed, 191 insertions(+), 221 deletions(-) >
Re: [PATCH v2] of/fdt: Rework early_init_dt_scan_memory() to call directly
On 12/8/21 10:58 AM, Rob Herring wrote: > Use of the of_scan_flat_dt() function predates libfdt and is discouraged > as libfdt provides a nicer set of APIs. Rework > early_init_dt_scan_memory() to be called directly and use libfdt. > > Cc: John Crispin > Cc: Thomas Bogendoerfer > Cc: Michael Ellerman > Cc: Benjamin Herrenschmidt > Cc: Paul Mackerras > Cc: Frank Rowand > Cc: linux-m...@vger.kernel.org > Cc: linuxppc-dev@lists.ozlabs.org > Signed-off-by: Rob Herring > --- > v2: > - ralink: Use 'if' instead of 'else if' > - early_init_dt_scan_memory: continue instead of return on no reg > - Fix indentation > --- > arch/mips/ralink/of.c | 19 +++ > arch/powerpc/kernel/prom.c | 16 - > drivers/of/fdt.c | 68 -- > include/linux/of_fdt.h | 3 +- > 4 files changed, 49 insertions(+), 57 deletions(-) Reviewed-by: Frank Rowand -Frank > > diff --git a/arch/mips/ralink/of.c b/arch/mips/ralink/of.c > index 0135376c5de5..35a87a2da10b 100644 > --- a/arch/mips/ralink/of.c > +++ b/arch/mips/ralink/of.c > @@ -53,17 +53,6 @@ void __init device_tree_init(void) > unflatten_and_copy_device_tree(); > } > > -static int memory_dtb; > - > -static int __init early_init_dt_find_memory(unsigned long node, > - const char *uname, int depth, void *data) > -{ > - if (depth == 1 && !strcmp(uname, "memory@0")) > - memory_dtb = 1; > - > - return 0; > -} > - > void __init plat_mem_setup(void) > { > void *dtb; > @@ -77,10 +66,10 @@ void __init plat_mem_setup(void) > dtb = get_fdt(); > __dt_setup_arch(dtb); > > - of_scan_flat_dt(early_init_dt_find_memory, NULL); > - if (memory_dtb) > - of_scan_flat_dt(early_init_dt_scan_memory, NULL); > - else if (soc_info.mem_detect) > + if (!early_init_dt_scan_memory()) > + return; > + > + if (soc_info.mem_detect) > soc_info.mem_detect(); > else if (soc_info.mem_size) > memblock_add(soc_info.mem_base, soc_info.mem_size * SZ_1M); > diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c > index 6e1a106f02eb..63762a3b75e8 100644 > --- a/arch/powerpc/kernel/prom.c > +++ b/arch/powerpc/kernel/prom.c > @@ -532,19 +532,19 @@ static int __init early_init_drmem_lmb(struct > drmem_lmb *lmb, > } > #endif /* CONFIG_PPC_PSERIES */ > > -static int __init early_init_dt_scan_memory_ppc(unsigned long node, > - const char *uname, > - int depth, void *data) > +static int __init early_init_dt_scan_memory_ppc(void) > { > #ifdef CONFIG_PPC_PSERIES > - if (depth == 1 && > - strcmp(uname, "ibm,dynamic-reconfiguration-memory") == 0) { > + const void *fdt = initial_boot_params; > + int node = fdt_path_offset(fdt, "/ibm,dynamic-reconfiguration-memory"); > + > + if (node > 0) { > walk_drmem_lmbs_early(node, NULL, early_init_drmem_lmb); > return 0; > } > #endif > > - return early_init_dt_scan_memory(node, uname, depth, data); > + return early_init_dt_scan_memory(); > } > > /* > @@ -749,7 +749,7 @@ void __init early_init_devtree(void *params) > > /* Scan memory nodes and rebuild MEMBLOCKs */ > early_init_dt_scan_root(); > - of_scan_flat_dt(early_init_dt_scan_memory_ppc, NULL); > + early_init_dt_scan_memory_ppc(); > > parse_early_param(); > > @@ -858,7 +858,7 @@ void __init early_get_first_memblock_info(void *params, > phys_addr_t *size) >*/ > add_mem_to_memblock = 0; > early_init_dt_scan_root(); > - of_scan_flat_dt(early_init_dt_scan_memory_ppc, NULL); > + early_init_dt_scan_memory_ppc(); > add_mem_to_memblock = 1; > > if (size) > diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c > index 5e216555fe4f..a835c458f50a 100644 > --- a/drivers/of/fdt.c > +++ b/drivers/of/fdt.c > @@ -1078,49 +1078,53 @@ u64 __init dt_mem_next_cell(int s, const __be32 > **cellp) > /* > * early_init_dt_scan_memory - Look for and parse memory nodes > */ > -int __init early_init_dt_scan_memory(unsigned long node, const char *uname, > - int depth, void *data) > +int __init early_init_dt_scan_memory(void) > { > - const char *type = of_get_flat_dt_prop(node, "device_type", NULL); > - const __be32 *reg, *endp; > - int l; > - bool hotpluggable; > + int no
Re: [PATCH 0/3] of/fdt: Rework early FDT scanning functions
On 11/18/21 1:12 PM, Rob Herring wrote: > The early FDT scanning functions use of_scan_flat_dt() which implements > its own node walking method. This function predates libfdt and is an > unnecessary indirection. This series reworks > early_init_dt_scan_chosen(), early_init_dt_scan_root(), and > early_init_dt_scan_memory() to be called directly and use libfdt calls. > > Ultimately, I want to remove of_scan_flat_dt(). Most of the remaining > of_scan_flat_dt() users are in powerpc. > > Rob > > > Rob Herring (3): > of/fdt: Rework early_init_dt_scan_chosen() to call directly > of/fdt: Rework early_init_dt_scan_root() to call directly > of/fdt: Rework early_init_dt_scan_memory() to call directly > > arch/mips/ralink/of.c| 16 +--- > arch/powerpc/kernel/prom.c | 22 ++--- > arch/powerpc/mm/nohash/kaslr_booke.c | 4 +- > drivers/of/fdt.c | 121 ++- > include/linux/of_fdt.h | 9 +- > 5 files changed, 79 insertions(+), 93 deletions(-) > "checkpatch --strict" reports some "CHECK" issues, but review of the patches for correctness becomes much more difficult if they are addressed, so they should be ignored for this series. -Frank
Re: [PATCH 3/3] of/fdt: Rework early_init_dt_scan_memory() to call directly
On 11/18/21 1:12 PM, Rob Herring wrote: > Use of the of_scan_flat_dt() function predates libfdt and is discouraged > as libfdt provides a nicer set of APIs. Rework > early_init_dt_scan_memory() to be called directly and use libfdt. > > Cc: John Crispin > Cc: Thomas Bogendoerfer > Cc: Michael Ellerman > Cc: Benjamin Herrenschmidt > Cc: Paul Mackerras > Cc: Frank Rowand > Cc: linux-m...@vger.kernel.org > Cc: linuxppc-dev@lists.ozlabs.org > Signed-off-by: Rob Herring > --- > arch/mips/ralink/of.c | 16 ++--- > arch/powerpc/kernel/prom.c | 16 - > drivers/of/fdt.c | 68 -- > include/linux/of_fdt.h | 3 +- > 4 files changed, 47 insertions(+), 56 deletions(-) > > diff --git a/arch/mips/ralink/of.c b/arch/mips/ralink/of.c > index 0135376c5de5..e1d79523343a 100644 > --- a/arch/mips/ralink/of.c > +++ b/arch/mips/ralink/of.c > @@ -53,17 +53,6 @@ void __init device_tree_init(void) > unflatten_and_copy_device_tree(); > } > > -static int memory_dtb; > - > -static int __init early_init_dt_find_memory(unsigned long node, > - const char *uname, int depth, void *data) > -{ > - if (depth == 1 && !strcmp(uname, "memory@0")) > - memory_dtb = 1; > - > - return 0; > -} > - > void __init plat_mem_setup(void) > { > void *dtb; > @@ -77,9 +66,8 @@ void __init plat_mem_setup(void) > dtb = get_fdt(); > __dt_setup_arch(dtb); > > - of_scan_flat_dt(early_init_dt_find_memory, NULL); > - if (memory_dtb) > - of_scan_flat_dt(early_init_dt_scan_memory, NULL); > + if (!early_init_dt_scan_memory()) > + return; > else if (soc_info.mem_detect) The previous chunk is now: if (XXX) return; instead of: if (XXX) YYY(); so "else if (soc_info.mem_detect)" should be: if (soc_info.mem_detect) > soc_info.mem_detect(); > else if (soc_info.mem_size) > diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c > index 6e1a106f02eb..63762a3b75e8 100644 > --- a/arch/powerpc/kernel/prom.c > +++ b/arch/powerpc/kernel/prom.c > @@ -532,19 +532,19 @@ static int __init early_init_drmem_lmb(struct > drmem_lmb *lmb, > } > #endif /* CONFIG_PPC_PSERIES */ > > -static int __init early_init_dt_scan_memory_ppc(unsigned long node, > - const char *uname, > - int depth, void *data) > +static int __init early_init_dt_scan_memory_ppc(void) > { > #ifdef CONFIG_PPC_PSERIES > - if (depth == 1 && > - strcmp(uname, "ibm,dynamic-reconfiguration-memory") == 0) { > + const void *fdt = initial_boot_params; > + int node = fdt_path_offset(fdt, "/ibm,dynamic-reconfiguration-memory"); > + > + if (node > 0) { > walk_drmem_lmbs_early(node, NULL, early_init_drmem_lmb); > return 0; > } > #endif > > - return early_init_dt_scan_memory(node, uname, depth, data); > + return early_init_dt_scan_memory(); > } > > /* > @@ -749,7 +749,7 @@ void __init early_init_devtree(void *params) > > /* Scan memory nodes and rebuild MEMBLOCKs */ > early_init_dt_scan_root(); > - of_scan_flat_dt(early_init_dt_scan_memory_ppc, NULL); > + early_init_dt_scan_memory_ppc(); > > parse_early_param(); > > @@ -858,7 +858,7 @@ void __init early_get_first_memblock_info(void *params, > phys_addr_t *size) >*/ > add_mem_to_memblock = 0; > early_init_dt_scan_root(); > - of_scan_flat_dt(early_init_dt_scan_memory_ppc, NULL); > + early_init_dt_scan_memory_ppc(); > add_mem_to_memblock = 1; > > if (size) > diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c > index 5e216555fe4f..a799117886f4 100644 > --- a/drivers/of/fdt.c > +++ b/drivers/of/fdt.c > @@ -1078,49 +1078,53 @@ u64 __init dt_mem_next_cell(int s, const __be32 > **cellp) > /* > * early_init_dt_scan_memory - Look for and parse memory nodes > */ > -int __init early_init_dt_scan_memory(unsigned long node, const char *uname, > - int depth, void *data) > +int __init early_init_dt_scan_memory(void) > { > - const char *type = of_get_flat_dt_prop(node, "device_type", NULL); > - const __be32 *reg, *endp; > - int l; > - bool hotpluggable; > + int node; > + const void *fdt = initial_boot_params; > > - /* We are scanning "memory" nodes only */ > - if (type == NULL || s
Re: [PATCH 2/3] of/fdt: Rework early_init_dt_scan_root() to call directly
On 11/18/21 1:12 PM, Rob Herring wrote: > Use of the of_scan_flat_dt() function predates libfdt and is discouraged > as libfdt provides a nicer set of APIs. Rework early_init_dt_scan_root() > to be called directly and use libfdt. > > Cc: Michael Ellerman > Cc: Benjamin Herrenschmidt > Cc: Paul Mackerras > Cc: Frank Rowand > Cc: linuxppc-dev@lists.ozlabs.org > Signed-off-by: Rob Herring > --- > arch/powerpc/kernel/prom.c | 4 ++-- > drivers/of/fdt.c | 14 +++--- > include/linux/of_fdt.h | 3 +-- > 3 files changed, 10 insertions(+), 11 deletions(-) > > diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c > index c6c398ccd98a..6e1a106f02eb 100644 > --- a/arch/powerpc/kernel/prom.c > +++ b/arch/powerpc/kernel/prom.c > @@ -748,7 +748,7 @@ void __init early_init_devtree(void *params) > of_scan_flat_dt(early_init_dt_scan_chosen_ppc, boot_command_line); > > /* Scan memory nodes and rebuild MEMBLOCKs */ > - of_scan_flat_dt(early_init_dt_scan_root, NULL); > + early_init_dt_scan_root(); > of_scan_flat_dt(early_init_dt_scan_memory_ppc, NULL); > > parse_early_param(); > @@ -857,7 +857,7 @@ void __init early_get_first_memblock_info(void *params, > phys_addr_t *size) >* mess the memblock. >*/ > add_mem_to_memblock = 0; > - of_scan_flat_dt(early_init_dt_scan_root, NULL); > + early_init_dt_scan_root(); > of_scan_flat_dt(early_init_dt_scan_memory_ppc, NULL); > add_mem_to_memblock = 1; > > diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c > index 1f1705f76263..5e216555fe4f 100644 > --- a/drivers/of/fdt.c > +++ b/drivers/of/fdt.c > @@ -1042,13 +1042,14 @@ int __init early_init_dt_scan_chosen_stdout(void) > /* > * early_init_dt_scan_root - fetch the top level address and size cells > */ > -int __init early_init_dt_scan_root(unsigned long node, const char *uname, > -int depth, void *data) > +int __init early_init_dt_scan_root(void) > { > const __be32 *prop; > + const void *fdt = initial_boot_params; > + int node = fdt_path_offset(fdt, "/"); > > - if (depth != 0) > - return 0; > + if (node < 0) > + return -ENODEV; > > dt_root_size_cells = OF_ROOT_NODE_SIZE_CELLS_DEFAULT; > dt_root_addr_cells = OF_ROOT_NODE_ADDR_CELLS_DEFAULT; > @@ -1063,8 +1064,7 @@ int __init early_init_dt_scan_root(unsigned long node, > const char *uname, > dt_root_addr_cells = be32_to_cpup(prop); > pr_debug("dt_root_addr_cells = %x\n", dt_root_addr_cells); > > - /* break now */ > - return 1; > + return 0; > } > > u64 __init dt_mem_next_cell(int s, const __be32 **cellp) > @@ -1263,7 +1263,7 @@ void __init early_init_dt_scan_nodes(void) > int rc; > > /* Initialize {size,address}-cells info */ > - of_scan_flat_dt(early_init_dt_scan_root, NULL); > + early_init_dt_scan_root(); > > /* Retrieve various information from the /chosen node */ > rc = early_init_dt_scan_chosen(boot_command_line); > diff --git a/include/linux/of_fdt.h b/include/linux/of_fdt.h > index 654722235df6..df3d31926c3c 100644 > --- a/include/linux/of_fdt.h > +++ b/include/linux/of_fdt.h > @@ -68,8 +68,7 @@ extern void early_init_dt_add_memory_arch(u64 base, u64 > size); > extern u64 dt_mem_next_cell(int s, const __be32 **cellp); > > /* Early flat tree scan hooks */ > -extern int early_init_dt_scan_root(unsigned long node, const char *uname, > -int depth, void *data); > +extern int early_init_dt_scan_root(void); > > extern bool early_init_dt_scan(void *params); > extern bool early_init_dt_verify(void *params); > Reviewed-by: Frank Rowand
Re: [PATCH 1/3] of/fdt: Rework early_init_dt_scan_chosen() to call directly
On 11/18/21 1:12 PM, Rob Herring wrote: > Use of the of_scan_flat_dt() function predates libfdt and is discouraged > as libfdt provides a nicer set of APIs. Rework > early_init_dt_scan_chosen() to be called directly and use libfdt. > > Cc: Michael Ellerman > Cc: Benjamin Herrenschmidt > Cc: Paul Mackerras > Cc: Frank Rowand > Cc: linuxppc-dev@lists.ozlabs.org > Signed-off-by: Rob Herring > --- > arch/powerpc/kernel/prom.c | 2 +- > arch/powerpc/mm/nohash/kaslr_booke.c | 4 +-- > drivers/of/fdt.c | 39 ++-- > include/linux/of_fdt.h | 3 +-- > 4 files changed, 22 insertions(+), 26 deletions(-) > > diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c > index 2e67588f6f6e..c6c398ccd98a 100644 > --- a/arch/powerpc/kernel/prom.c > +++ b/arch/powerpc/kernel/prom.c > @@ -402,7 +402,7 @@ static int __init early_init_dt_scan_chosen_ppc(unsigned > long node, > const unsigned long *lprop; /* All these set by kernel, so no need to > convert endian */ > > /* Use common scan routine to determine if this is the chosen node */ > - if (early_init_dt_scan_chosen(node, uname, depth, data) == 0) > + if (early_init_dt_scan_chosen(data) < 0) > return 0; > > #ifdef CONFIG_PPC64 > diff --git a/arch/powerpc/mm/nohash/kaslr_booke.c > b/arch/powerpc/mm/nohash/kaslr_booke.c > index 8fc49b1b4a91..90debe19ab4c 100644 > --- a/arch/powerpc/mm/nohash/kaslr_booke.c > +++ b/arch/powerpc/mm/nohash/kaslr_booke.c > @@ -44,9 +44,7 @@ struct regions __initdata regions; > > static __init void kaslr_get_cmdline(void *fdt) > { > - int node = fdt_path_offset(fdt, "/chosen"); > - > - early_init_dt_scan_chosen(node, "chosen", 1, boot_command_line); > + early_init_dt_scan_chosen(boot_command_line); > } > > static unsigned long __init rotate_xor(unsigned long hash, const void *area, > diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c > index bdca35284ceb..1f1705f76263 100644 > --- a/drivers/of/fdt.c > +++ b/drivers/of/fdt.c > @@ -1124,18 +1124,18 @@ int __init early_init_dt_scan_memory(unsigned long > node, const char *uname, > return 0; > } > > -int __init early_init_dt_scan_chosen(unsigned long node, const char *uname, > - int depth, void *data) > +int __init early_init_dt_scan_chosen(char *cmdline) > { > - int l; > + int l, node; > const char *p; > const void *rng_seed; > + const void *fdt = initial_boot_params; > > - pr_debug("search \"chosen\", depth: %d, uname: %s\n", depth, uname); > - > - if (depth != 1 || !data || > - (strcmp(uname, "chosen") != 0 && strcmp(uname, "chosen@0") != 0)) > - return 0; > + node = fdt_path_offset(fdt, "/chosen"); > + if (node < 0) > + node = fdt_path_offset(fdt, "/chosen@0"); > + if (node < 0) > + return -ENOENT; > > early_init_dt_check_for_initrd(node); > early_init_dt_check_for_elfcorehdr(node); > @@ -1144,7 +1144,7 @@ int __init early_init_dt_scan_chosen(unsigned long > node, const char *uname, > /* Retrieve command line */ > p = of_get_flat_dt_prop(node, "bootargs", &l); > if (p != NULL && l > 0) > - strlcpy(data, p, min(l, COMMAND_LINE_SIZE)); > + strlcpy(cmdline, p, min(l, COMMAND_LINE_SIZE)); > > /* >* CONFIG_CMDLINE is meant to be a default in case nothing else > @@ -1153,18 +1153,18 @@ int __init early_init_dt_scan_chosen(unsigned long > node, const char *uname, >*/ > #ifdef CONFIG_CMDLINE > #if defined(CONFIG_CMDLINE_EXTEND) > - strlcat(data, " ", COMMAND_LINE_SIZE); > - strlcat(data, CONFIG_CMDLINE, COMMAND_LINE_SIZE); > + strlcat(cmdline, " ", COMMAND_LINE_SIZE); > + strlcat(cmdline, CONFIG_CMDLINE, COMMAND_LINE_SIZE); > #elif defined(CONFIG_CMDLINE_FORCE) > - strlcpy(data, CONFIG_CMDLINE, COMMAND_LINE_SIZE); > + strlcpy(cmdline, CONFIG_CMDLINE, COMMAND_LINE_SIZE); > #else > /* No arguments from boot loader, use kernel's cmdl*/ > - if (!((char *)data)[0]) > - strlcpy(data, CONFIG_CMDLINE, COMMAND_LINE_SIZE); > + if (!((char *)cmdline)[0]) > + strlcpy(cmdline, CONFIG_CMDLINE, COMMAND_LINE_SIZE); > #endif > #endif /* CONFIG_CMDLINE */ > > - pr_debug("Command line is: %s\n", (char *)data); > + pr_debug("Command line is: %s\n", (char *)cmdline); > > rng_s
Re: [PATCH/RFC] of: Shrink struct of_device_id
Hi Geert, On 11/10/21 11:23 AM, Geert Uytterhoeven wrote: > Currently struct of_device_id is 196 (32-bit) or 200 (64-bit) bytes > large. It contains fixed-size strings for a name, a type, and a > compatible value, but the first two are barely used. > OF device ID tables contain multiple entries, plus an empty sentinel > entry. > > Statistics for my current kernel source tree: > - 4487 tables with 16836 entries (3367200 bytes) > - 176 names (average 6.7 max 23 chars) > - 66 types (average 5.1 max 21 chars) > - 12192 compatible values (average 18.0 max 45 chars) > Taking into account the minimum needed size to store the strings, only > 6.9% of the allocated space is used... I like the idea of using less memory (and thank you for the above data!), but I do not like the implementation, which reduces the size (of name at least - I didn't check each field) to less than what the standard allows. I have an idea of another way to accomplish the same goal, but I need to dig a bit to make sure I'm not shooting myself in the foot. -Frank > > Reduce kernel size by reducing the sizes of the fixed strings by one > half. > > This reduces the size of an ARM multi_v7_defconfig kernel by ca. 400 > KiB. For a typical kernel supporting a single board, you can expect to > save 50-100 KiB. > > Signed-off-by: Geert Uytterhoeven > --- > Notes: > - While gcc complains if the non-NUL characters in a string do not fit > in the available space, it does not complain if there is no space to > store the string's NUL-terminator. However, that should be caught > during testing, as the affected entry won't ever match. The kernel > won't crash, as such strings will still be terminated by the > sentinel in the table. > > - We could save even more by converting the strings from fixed-size > arrays to pointers, at the expense of making it harder to mark > entries __init. Given most drivers support binding and unbinding > and thus cannot use __init for of_device_id tables, perhaps that's > the way to go? > > Thanks for your comments! > --- > include/linux/mod_devicetable.h | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > diff --git a/include/linux/mod_devicetable.h b/include/linux/mod_devicetable.h > index ae2e75d15b219920..2bb2558d52d30d2b 100644 > --- a/include/linux/mod_devicetable.h > +++ b/include/linux/mod_devicetable.h > @@ -266,9 +266,9 @@ struct sdw_device_id { > * Struct used for matching a device > */ > struct of_device_id { > - charname[32]; > - chartype[32]; > - charcompatible[128]; > + charname[24]; > + chartype[24]; > + charcompatible[48]; > const void *data; > }; > >
Re: [Bug 206203] kmemleak reports various leaks in drivers/of/unittest.c
On 4/15/20 10:27 PM, Frank Rowand wrote: > On 4/8/20 10:22 AM, Frank Rowand wrote: >> Hi Michael, >> >> On 4/7/20 10:13 PM, Michael Ellerman wrote: >>> bugzilla-dae...@bugzilla.kernel.org writes: >>>> https://bugzilla.kernel.org/show_bug.cgi?id=206203 >>>> >>>> Erhard F. (erhar...@mailbox.org) changed: >>>> >>>>What|Removed |Added >>>> >>>> Attachment #286801|0 |1 >>>> is obsolete|| >>>> >>>> --- Comment #10 from Erhard F. (erhar...@mailbox.org) --- >>>> Created attachment 288189 >>>> --> https://bugzilla.kernel.org/attachment.cgi?id=288189&action=edit >>>> kmemleak output (kernel 5.6.2, Talos II) >>> >>> These are all in or triggered by the of unittest code AFAICS. >>> Content of the log reproduced below. >>> >>> Frank/Rob, are these memory leaks expected? >> >> Thanks for the report. I'll look at each one. > > Only one of the leaks was expected. I have patches to fix the > unexpected leaks and to remove the expected leak so that the > kmemleak report of it will not have to be checked again. > > I expect to send the patch series tomorrow (Thursday). The patches for the memory leaks that I saw on an ARM board are at: https://lore.kernel.org/r/1587073370-25963-1-git-send-email-frowand.l...@gmail.com -Frank < snip >
Re: [Bug 206203] kmemleak reports various leaks in drivers/of/unittest.c
On 4/8/20 10:22 AM, Frank Rowand wrote: > Hi Michael, > > On 4/7/20 10:13 PM, Michael Ellerman wrote: >> bugzilla-dae...@bugzilla.kernel.org writes: >>> https://bugzilla.kernel.org/show_bug.cgi?id=206203 >>> >>> Erhard F. (erhar...@mailbox.org) changed: >>> >>>What|Removed |Added >>> >>> Attachment #286801|0 |1 >>> is obsolete|| >>> >>> --- Comment #10 from Erhard F. (erhar...@mailbox.org) --- >>> Created attachment 288189 >>> --> https://bugzilla.kernel.org/attachment.cgi?id=288189&action=edit >>> kmemleak output (kernel 5.6.2, Talos II) >> >> These are all in or triggered by the of unittest code AFAICS. >> Content of the log reproduced below. >> >> Frank/Rob, are these memory leaks expected? > > Thanks for the report. I'll look at each one. Only one of the leaks was expected. I have patches to fix the unexpected leaks and to remove the expected leak so that the kmemleak report of it will not have to be checked again. I expect to send the patch series tomorrow (Thursday). -Frank > > -Frank > > >> >> cheers >> >> >> unreferenced object 0xc007eb89ca58 (size 192): >> comm "swapper/0", pid 1, jiffies 4294878935 (age 824.747s) >> hex dump (first 32 bytes): >> c0 00 00 00 00 d9 21 38 00 00 00 00 00 00 00 00 ..!8 >> c0 00 00 07 ec 97 80 08 00 00 00 00 00 00 00 00 >> backtrace: >> [<07b50c76>] .__of_node_dup+0x38/0x1c0 >> [<e2115f4f>] .of_unittest_changeset+0x13c/0xa20 >> [<925a8013>] .of_unittest+0x1ba0/0x3778 >> [<af117d89>] .do_one_initcall+0x7c/0x420 >> [<c99776b4>] .kernel_init_freeable+0x318/0x3d8 >> [<01b957ee>] .kernel_init+0x14/0x168 >> [<1fe347b5>] .ret_from_kernel_thread+0x58/0x68 >> unreferenced object 0xc007ec978008 (size 8): >> comm "swapper/0", pid 1, jiffies 4294878935 (age 824.747s) >> hex dump (first 8 bytes): >> 6e 31 00 6b 6b 6b 6b a5 n1.. >> backtrace: >> [<c939be68>] .kstrdup+0x44/0xb0 >> [<a074532f>] .__of_node_dup+0x50/0x1c0 >> [<e2115f4f>] .of_unittest_changeset+0x13c/0xa20 >> [<925a8013>] .of_unittest+0x1ba0/0x3778 >> [<af117d89>] .do_one_initcall+0x7c/0x420 >> [<c99776b4>] .kernel_init_freeable+0x318/0x3d8 >> [<01b957ee>] .kernel_init+0x14/0x168 >> [<1fe347b5>] .ret_from_kernel_thread+0x58/0x68 >> unreferenced object 0xc007eb89e318 (size 192): >> comm "swapper/0", pid 1, jiffies 4294878935 (age 824.747s) >> hex dump (first 32 bytes): >> c0 00 00 00 00 d9 21 38 00 00 00 00 00 00 00 00 ..!8 >> c0 00 00 07 ec 97 ab 08 00 00 00 00 00 00 00 00 >> backtrace: >> [<07b50c76>] .__of_node_dup+0x38/0x1c0 >> [<881dc9c4>] .of_unittest_changeset+0x194/0xa20 >> [<925a8013>] .of_unittest+0x1ba0/0x3778 >> [<af117d89>] .do_one_initcall+0x7c/0x420 >> [<c99776b4>] .kernel_init_freeable+0x318/0x3d8 >> [<01b957ee>] .kernel_init+0x14/0x168 >> [<1fe347b5>] .ret_from_kernel_thread+0x58/0x68 >> unreferenced object 0xc007ec97ab08 (size 8): >> comm "swapper/0", pid 1, jiffies 4294878935 (age 824.747s) >> hex dump (first 8 bytes): >> 6e 32 00 6b 6b 6b 6b a5 n2.. >> backtrace: >> [<c939be68>] .kstrdup+0x44/0xb0 >> [<a074532f>] .__of_node_dup+0x50/0x1c0 >> [<881dc9c4>] .of_unittest_changeset+0x194/0xa20 >> [<925a8013>] .of_unittest+0x1ba0/0x3778 >> [<af117d89>] .do_one_initcall+0x7c/0x420 >> [<c99776b4>] .kernel_init_freeable+0x318/0x3d8 >> [<01b957ee>] .kernel_init+0x14/0x168 >> [<1fe347b5>] .ret_from_kernel_thread+0x58/0x68 >> unreferenced object 0xc007eb89e528 (size 192): >> comm "swapper/0", pid 1, jiffies 4294878935 (age 824.747s) >> hex dump (first 32 bytes): >> c0 00 00 07 ec 97 bd d8 00 00 00 00 00 00 00 00 ..
Re: [Bug 206203] kmemleak reports various leaks in drivers/of/unittest.c
Hi Michael, On 4/7/20 10:13 PM, Michael Ellerman wrote: > bugzilla-dae...@bugzilla.kernel.org writes: >> https://bugzilla.kernel.org/show_bug.cgi?id=206203 >> >> Erhard F. (erhar...@mailbox.org) changed: >> >>What|Removed |Added >> >> Attachment #286801|0 |1 >> is obsolete|| >> >> --- Comment #10 from Erhard F. (erhar...@mailbox.org) --- >> Created attachment 288189 >> --> https://bugzilla.kernel.org/attachment.cgi?id=288189&action=edit >> kmemleak output (kernel 5.6.2, Talos II) > > These are all in or triggered by the of unittest code AFAICS. > Content of the log reproduced below. > > Frank/Rob, are these memory leaks expected? Thanks for the report. I'll look at each one. -Frank > > cheers > > > unreferenced object 0xc007eb89ca58 (size 192): > comm "swapper/0", pid 1, jiffies 4294878935 (age 824.747s) > hex dump (first 32 bytes): > c0 00 00 00 00 d9 21 38 00 00 00 00 00 00 00 00 ..!8 > c0 00 00 07 ec 97 80 08 00 00 00 00 00 00 00 00 > backtrace: > [<07b50c76>] .__of_node_dup+0x38/0x1c0 > [] .of_unittest_changeset+0x13c/0xa20 > [<925a8013>] .of_unittest+0x1ba0/0x3778 > [ ] .do_one_initcall+0x7c/0x420 > [ ] .kernel_init_freeable+0x318/0x3d8 > [<01b957ee>] .kernel_init+0x14/0x168 > [<1fe347b5>] .ret_from_kernel_thread+0x58/0x68 > unreferenced object 0xc007ec978008 (size 8): > comm "swapper/0", pid 1, jiffies 4294878935 (age 824.747s) > hex dump (first 8 bytes): > 6e 31 00 6b 6b 6b 6b a5 n1.. > backtrace: > [ ] .kstrdup+0x44/0xb0 > [ ] .__of_node_dup+0x50/0x1c0 > [ ] .of_unittest_changeset+0x13c/0xa20 > [<925a8013>] .of_unittest+0x1ba0/0x3778 > [ ] .do_one_initcall+0x7c/0x420 > [ ] .kernel_init_freeable+0x318/0x3d8 > [<01b957ee>] .kernel_init+0x14/0x168 > [<1fe347b5>] .ret_from_kernel_thread+0x58/0x68 > unreferenced object 0xc007eb89e318 (size 192): > comm "swapper/0", pid 1, jiffies 4294878935 (age 824.747s) > hex dump (first 32 bytes): > c0 00 00 00 00 d9 21 38 00 00 00 00 00 00 00 00 ..!8 > c0 00 00 07 ec 97 ab 08 00 00 00 00 00 00 00 00 > backtrace: > [<07b50c76>] .__of_node_dup+0x38/0x1c0 > [<881dc9c4>] .of_unittest_changeset+0x194/0xa20 > [<925a8013>] .of_unittest+0x1ba0/0x3778 > [ ] .do_one_initcall+0x7c/0x420 > [ ] .kernel_init_freeable+0x318/0x3d8 > [<01b957ee>] .kernel_init+0x14/0x168 > [<1fe347b5>] .ret_from_kernel_thread+0x58/0x68 > unreferenced object 0xc007ec97ab08 (size 8): > comm "swapper/0", pid 1, jiffies 4294878935 (age 824.747s) > hex dump (first 8 bytes): > 6e 32 00 6b 6b 6b 6b a5 n2.. > backtrace: > [ ] .kstrdup+0x44/0xb0 > [ ] .__of_node_dup+0x50/0x1c0 > [<881dc9c4>] .of_unittest_changeset+0x194/0xa20 > [<925a8013>] .of_unittest+0x1ba0/0x3778 > [ ] .do_one_initcall+0x7c/0x420 > [ ] .kernel_init_freeable+0x318/0x3d8 > [<01b957ee>] .kernel_init+0x14/0x168 > [<1fe347b5>] .ret_from_kernel_thread+0x58/0x68 > unreferenced object 0xc007eb89e528 (size 192): > comm "swapper/0", pid 1, jiffies 4294878935 (age 824.747s) > hex dump (first 32 bytes): > c0 00 00 07 ec 97 bd d8 00 00 00 00 00 00 00 00 > c0 00 00 07 ec 97 b3 18 00 00 00 00 00 00 00 00 > backtrace: > [<07b50c76>] .__of_node_dup+0x38/0x1c0 > [ ] .of_unittest_changeset+0x1ec/0xa20 > [<925a8013>] .of_unittest+0x1ba0/0x3778 > [ ] .do_one_initcall+0x7c/0x420 > [ ] .kernel_init_freeable+0x318/0x3d8 > [<01b957ee>] .kernel_init+0x14/0x168 > [<1fe347b5>] .ret_from_kernel_thread+0x58/0x68 > unreferenced object 0xc007ec97b318 (size 8): > comm "swapper/0", pid 1, jiffies 4294878935 (age 824.747s) > hex dump (first 8 bytes): > 6e 32 31 00 6b 6b 6b a5 n21.kkk. > backtrace: > [ ] .kstrdup+0x44/0xb0 > [ ] .__of_node_dup+0x50/0x1c0 > [ ] .of_unittest_changeset+0x1ec/0xa20 > [<925a8013>] .of_unittest+0x1ba0/0x3778 > [ ] .do_one_initcall+0x7c/0x420 > [ ] .kernel_init_freeable+0x318/0x3d8 > [<01b957ee>] .kernel_init+0x14/0x168 > [<1fe347b5>] .ret_from_kernel_thread+0x58/0x
Re: [PATCH] dt: Remove booting-without-of.txt
On 3/4/20 12:45 PM, Rob Herring wrote: > Well, not quite removed yet... Mauro is looking at moving this to ReST, > but I think it would be better to trim or remove it. > > boot-without-of.txt is an ancient document that first outlined > Flattened DeviceTree. The DT world has evolved a lot in the 15 years > since and boot-without-of.txt is pretty stale. The name of the document > itself is confusing if you don't understand the evolution from real > 'OpenFirmware'. Much of what booting-without-of.txt contains is now in > the DT specification (which evolved out of the ePAPR). > > This is a first pass of removing everything that has a DT spec > equivalent or is no longer standard practice (e.g. soc for SoC > nodes) in order to see what's left. This is what I have: > > TODO > - Move boot interface details to arch specific docs > - Document 'serial-number' property in DT spec > - Document the 'hotpluggable' memory property in DT spec > - Document the 'sleep' property (PPC only) > - Document the 'dma-coherent' property in DT spec > - Need the history of node names and 'name' property? > - Need how addresses work? Looks good. Since this is a first pass, I'm expecting that polishing (things like updating section numbers) would happen in subsequent patches after more of the content changes are done, so no need to do so in this patch. Reviewed-by: Frank Rowand -Frank > > Cc: Frank Rowand > Cc: Mauro Carvalho Chehab > Cc: Benjamin Herrenschmidt > Cc: Geert Uytterhoeven > Cc: Michael Ellerman > Cc: linuxppc-dev@lists.ozlabs.org > Signed-off-by: Rob Herring > --- > .../devicetree/booting-without-of.txt | 1027 + > 1 file changed, 1 insertion(+), 1026 deletions(-) > > diff --git a/Documentation/devicetree/booting-without-of.txt > b/Documentation/devicetree/booting-without-of.txt > index 4660ccee35a3..97beee828ba4 100644 > --- a/Documentation/devicetree/booting-without-of.txt > +++ b/Documentation/devicetree/booting-without-of.txt > @@ -19,44 +19,17 @@ Table of Contents > 5) Entry point for arch/sh > >II - The DT block format > -1) Header > 2) Device tree generalities > -3) Device tree "structure" block > -4) Device tree "strings" block > >III - Required content of the device tree > 1) Note about cells and address representation > -2) Note about "compatible" properties > -3) Note about "name" properties > -4) Note about node and property names and character set > 5) Required nodes and properties >a) The root node > - b) The /cpus node > - c) The /cpus/* nodes > - d) the /memory node(s) > - e) The /chosen node > - f) the /soc node > - > - IV - "dtc", the device tree compiler > - > - V - Recommendations for a bootloader > - > - VI - System-on-a-chip devices and nodes > -1) Defining child nodes of an SOC > -2) Representing devices without a current OF specification > - > - VII - Specifying interrupt information for devices > -1) interrupts property > -2) interrupt-parent property > -3) OpenPIC Interrupt Controllers > -4) ISA Interrupt Controllers > >VIII - Specifying device power management information (sleep property) > >IX - Specifying dma bus information > > - Appendix A - Sample SOC node for MPC8540 > - > > Revision Information > > @@ -105,19 +78,6 @@ Revision Information >- Added chapter VI > > > - ToDo: > - - Add some definitions of interrupt tree (simple/complex) > - - Add some definitions for PCI host bridges > - - Add some common address format examples > - - Add definitions for standard properties and "compatible" > - names for cells that are not already defined by the existing > - OF spec. > - - Compare FSL SOC use of PCI to standard and make sure no new > - node definition required. > - - Add more information about node definitions for SOC devices > - that currently have no standard, like the FSL CPM. > - > - > I - Introduction > > > @@ -333,196 +293,17 @@ II - The DT block format > > > > -This chapter defines the actual format of the flattened device-tree > -passed to the kernel. The actual content of it and kernel requirements > -are described later. You can find example of code manipulating that > -format in various places, including arch/powerpc/kernel/prom_init.c > -which will generate a flattened device-tree from the
Re: [PATCH] of: Add OF_DMA_DEFAULT_COHERENT & select it on powerpc
+ Frank (me) On 1/26/20 5:52 AM, Michael Ellerman wrote: > There's an OF helper called of_dma_is_coherent(), which checks if a > device has a "dma-coherent" property to see if the device is coherent > for DMA. > > But on some platforms devices are coherent by default, and on some > platforms it's not possible to update existing device trees to add the > "dma-coherent" property. > > So add a Kconfig symbol to allow arch code to tell > of_dma_is_coherent() that devices are coherent by default, regardless > of the presence of the property. > > Select that symbol on powerpc when NOT_COHERENT_CACHE is not set, ie. > when the system has a coherent cache. > > Fixes: 92ea637edea3 ("of: introduce of_dma_is_coherent() helper") > Cc: sta...@vger.kernel.org # v3.16+ > Reported-by: Christian Zigotzky > Tested-by: Christian Zigotzky > Signed-off-by: Michael Ellerman > --- > arch/powerpc/Kconfig | 1 + > drivers/of/Kconfig | 4 > drivers/of/address.c | 6 +- > 3 files changed, 10 insertions(+), 1 deletion(-) > > diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig > index 1ec34e16ed65..19f5aa8ac9a3 100644 > --- a/arch/powerpc/Kconfig > +++ b/arch/powerpc/Kconfig > @@ -238,6 +238,7 @@ config PPC > select NEED_DMA_MAP_STATE if PPC64 || NOT_COHERENT_CACHE > select NEED_SG_DMA_LENGTH > select OF > + select OF_DMA_DEFAULT_COHERENT if !NOT_COHERENT_CACHE > select OF_EARLY_FLATTREE > select OLD_SIGACTIONif PPC32 > select OLD_SIGSUSPEND > diff --git a/drivers/of/Kconfig b/drivers/of/Kconfig > index 37c2ccbefecd..d91618641be6 100644 > --- a/drivers/of/Kconfig > +++ b/drivers/of/Kconfig > @@ -103,4 +103,8 @@ config OF_OVERLAY > config OF_NUMA > bool > > +config OF_DMA_DEFAULT_COHERENT > + # arches should select this if DMA is coherent by default for OF devices > + bool > + > endif # OF > diff --git a/drivers/of/address.c b/drivers/of/address.c > index 99c1b8058559..e8a39c3ec4d4 100644 > --- a/drivers/of/address.c > +++ b/drivers/of/address.c > @@ -995,12 +995,16 @@ int of_dma_get_range(struct device_node *np, u64 > *dma_addr, u64 *paddr, u64 *siz > * @np: device node > * > * It returns true if "dma-coherent" property was found > - * for this device in DT. > + * for this device in the DT, or if DMA is coherent by > + * default for OF devices on the current platform. > */ > bool of_dma_is_coherent(struct device_node *np) > { > struct device_node *node = of_node_get(np); > > + if (IS_ENABLED(CONFIG_OF_DMA_DEFAULT_COHERENT)) > + return true; > + > while (node) { > if (of_property_read_bool(node, "dma-coherent")) { > of_node_put(node); >
Re: [RFC] Efficiency of the phandle_cache on ppc64/SLOF
On 12/10/19 2:17 AM, Frank Rowand wrote: > On 12/9/19 7:51 PM, Rob Herring wrote: >> On Mon, Dec 9, 2019 at 7:35 AM Sebastian Andrzej Siewior >> wrote: >>> >>> On 2019-12-05 20:01:41 [-0600], Frank Rowand wrote: >>>> Is there a memory usage issue for the systems that led to this thread? >>> >>> No, no memory issue led to this thread. I was just testing my patch and >>> I assumed that I did something wrong in the counting/lock drop/lock >>> acquire/allocate path because the array was hardly used. So I started to >>> look deeper… >>> Once I figured out everything was fine, I was curious if everyone is >>> aware of the different phandle creation by dtc vs POWER. And I posted >>> the mail in the thread. >>> Once you confirmed that everything is "known / not an issue" I was ready >>> to take off [0]. >>> >>> Later more replies came in such as one mail [1] from Rob describing the >>> original reason with 814 phandles. _Here_ I was just surprised that 1024 >>> were used over 64 entries for a benefit of 60ms. I understand that this >>> is low concern for you because that memory is released if modules are >>> not enabled. I usually see that module support is left enabled. >>> >>> However, Rob suggested / asked about the fixed size array (this is how I >>> understood it): >>> |And yes, as mentioned earlier I don't like the complexity. I didn't >>> |from the start and I'm I'm still of the opinion we should have a >>> |fixed or 1 time sized true cache (i.e. smaller than total # of >>> |phandles). That would solve the RT memory allocation and locking issue >>> |too. >>> >>> so I attempted to ask if we should have the fixed size array maybe >>> with the hash_32() instead the mask. This would make my other patch >>> obsolete because the fixed size array should not have a RT issue. The >>> hash_32() part here would address the POWER issue where the cache is >>> currently not used efficiently. >>> >>> If you want instead to keep things as-is then this is okay from my side. >>> If you want to keep this cache off on POWER then I could contribute a >>> patch doing so. >> >> It turns out there's actually a bug in the current implementation. If >> we have multiple phandles with the same mask, then we leak node >> references if we miss in the cache and re-assign the cache entry. > > Aaargh. Patch sent. > >> Easily fixed I suppose, but holding a ref count for a cached entry >> seems wrong. That means we never have a ref count of 0 on every node >> with a phandle. > > It will go to zero when the cache is freed. > > My memory is that we free the cache as part of removing an overlay. I'll > verify whether my memory is correct. And I'll look at non-overlay use of dynamic devicetree too. -Frank > > -Frank > > >> >> I've done some more experiments with the performance. I've come to the >> conclusion that just measuring boot time is not a good way at least if >> the time is not a significant percentage of the total. I couldn't get >> any measurable data. I'm using a RK3399 Rock960 board. It has 190 >> phandles. I get about 1500 calls to of_find_node_by_phandle() during >> boot. Note that about the first 300 are before we have any timekeeping >> (the prior measurements also would not account for this). Those calls >> have no cache in the current implementation and are cached in my >> implementation. >> >> no cache: 20124 us >> current cache: 819 us >> >> new cache (64 entry): 4342 us >> new cache (128 entry): 2875 us >> new cache (256 entry): 1235 us >> >> To get some idea on the time spent before timekeeping is up, if we >> take the avg miss time is ~17us (20124/1200), then we're spending >> about ~5ms on lookups before the cache is enabled. I'd estimate the >> new cache takes ~400us before timekeeping is up as there's 11 misses >> early. >> >> >From these numbers, it seems the miss rate has a significant impact on >> performance for the different sizes. But taken from the original 20+ >> ms, they all look like good improvement. >> >> Rob >> > >
Re: [RFC] Efficiency of the phandle_cache on ppc64/SLOF
On 12/9/19 7:51 PM, Rob Herring wrote: > On Mon, Dec 9, 2019 at 7:35 AM Sebastian Andrzej Siewior > wrote: >> >> On 2019-12-05 20:01:41 [-0600], Frank Rowand wrote: >>> Is there a memory usage issue for the systems that led to this thread? >> >> No, no memory issue led to this thread. I was just testing my patch and >> I assumed that I did something wrong in the counting/lock drop/lock >> acquire/allocate path because the array was hardly used. So I started to >> look deeper… >> Once I figured out everything was fine, I was curious if everyone is >> aware of the different phandle creation by dtc vs POWER. And I posted >> the mail in the thread. >> Once you confirmed that everything is "known / not an issue" I was ready >> to take off [0]. >> >> Later more replies came in such as one mail [1] from Rob describing the >> original reason with 814 phandles. _Here_ I was just surprised that 1024 >> were used over 64 entries for a benefit of 60ms. I understand that this >> is low concern for you because that memory is released if modules are >> not enabled. I usually see that module support is left enabled. >> >> However, Rob suggested / asked about the fixed size array (this is how I >> understood it): >> |And yes, as mentioned earlier I don't like the complexity. I didn't >> |from the start and I'm I'm still of the opinion we should have a >> |fixed or 1 time sized true cache (i.e. smaller than total # of >> |phandles). That would solve the RT memory allocation and locking issue >> |too. >> >> so I attempted to ask if we should have the fixed size array maybe >> with the hash_32() instead the mask. This would make my other patch >> obsolete because the fixed size array should not have a RT issue. The >> hash_32() part here would address the POWER issue where the cache is >> currently not used efficiently. >> >> If you want instead to keep things as-is then this is okay from my side. >> If you want to keep this cache off on POWER then I could contribute a >> patch doing so. > > It turns out there's actually a bug in the current implementation. If > we have multiple phandles with the same mask, then we leak node > references if we miss in the cache and re-assign the cache entry. Aaargh. Patch sent. > Easily fixed I suppose, but holding a ref count for a cached entry > seems wrong. That means we never have a ref count of 0 on every node > with a phandle. It will go to zero when the cache is freed. My memory is that we free the cache as part of removing an overlay. I'll verify whether my memory is correct. -Frank > > I've done some more experiments with the performance. I've come to the > conclusion that just measuring boot time is not a good way at least if > the time is not a significant percentage of the total. I couldn't get > any measurable data. I'm using a RK3399 Rock960 board. It has 190 > phandles. I get about 1500 calls to of_find_node_by_phandle() during > boot. Note that about the first 300 are before we have any timekeeping > (the prior measurements also would not account for this). Those calls > have no cache in the current implementation and are cached in my > implementation. > > no cache: 20124 us > current cache: 819 us > > new cache (64 entry): 4342 us > new cache (128 entry): 2875 us > new cache (256 entry): 1235 us > > To get some idea on the time spent before timekeeping is up, if we > take the avg miss time is ~17us (20124/1200), then we're spending > about ~5ms on lookups before the cache is enabled. I'd estimate the > new cache takes ~400us before timekeeping is up as there's 11 misses > early. > >>From these numbers, it seems the miss rate has a significant impact on > performance for the different sizes. But taken from the original 20+ > ms, they all look like good improvement. > > Rob >
Re: [RFC] Efficiency of the phandle_cache on ppc64/SLOF
On 12/5/19 7:52 PM, Frank Rowand wrote: > On 12/3/19 10:56 AM, Rob Herring wrote: >> On Mon, Dec 2, 2019 at 10:28 PM Frank Rowand wrote: >>> >>> On 12/2/19 10:12 PM, Michael Ellerman wrote: >>>> Frank Rowand writes: >>>>> On 11/29/19 9:10 AM, Sebastian Andrzej Siewior wrote: >>>>>> I've been looking at phandle_cache and noticed the following: The raw >>>>>> phandle value as generated by dtc starts at zero and is incremented by >>>>>> one for each phandle entry. The qemu pSeries model is using Slof (which >>>>>> is probably the same thing as used on real hardware) and this looks like >>>>>> a poiner value for the phandle. >>>>>> With >>>>>> qemu-system-ppc64le -m 16G -machine pseries -smp 8 >>>>>> >>>>>> I got the following output: >>>>>> | entries: 64 >>>>>> | phandle 7e732468 slot 28 hash c >>>>>> | phandle 7e732ad0 slot 10 hash 27 >>>>>> | phandle 7e732ee8 slot 28 hash 3a >>>>>> | phandle 7e734160 slot 20 hash 36 >>>>>> | phandle 7e734318 slot 18 hash 3a >>>>>> | phandle 7e734428 slot 28 hash 33 >>>>>> | phandle 7e734538 slot 38 hash 2c >>>>>> | phandle 7e734850 slot 10 hash e >>>>>> | phandle 7e735220 slot 20 hash 2d >>>>>> | phandle 7e735bf0 slot 30 hash d >>>>>> | phandle 7e7365c0 slot 0 hash 2d >>>>>> | phandle 7e736f90 slot 10 hash d >>>>>> | phandle 7e737960 slot 20 hash 2d >>>>>> | phandle 7e738330 slot 30 hash d >>>>>> | phandle 7e738d00 slot 0 hash 2d >>>>>> | phandle 7e739730 slot 30 hash 38 >>>>>> | phandle 7e73bd08 slot 8 hash 17 >>>>>> | phandle 7e73c2e0 slot 20 hash 32 >>>>>> | phandle 7e73c7f8 slot 38 hash 37 >>>>>> | phandle 7e782420 slot 20 hash 13 >>>>>> | phandle 7e782ed8 slot 18 hash 1b >>>>>> | phandle 7e73ce28 slot 28 hash 39 >>>>>> | phandle 7e73d390 slot 10 hash 22 >>>>>> | phandle 7e73d9a8 slot 28 hash 1a >>>>>> | phandle 7e73dc28 slot 28 hash 37 >>>>>> | phandle 7e73de00 slot 0 hash a >>>>>> | phandle 7e73e028 slot 28 hash 0 >>>>>> | phandle 7e7621a8 slot 28 hash 36 >>>>>> | phandle 7e73e458 slot 18 hash 1e >>>>>> | phandle 7e73e608 slot 8 hash 1e >>>>>> | phandle 7e740078 slot 38 hash 28 >>>>>> | phandle 7e740180 slot 0 hash 1d >>>>>> | phandle 7e740240 slot 0 hash 33 >>>>>> | phandle 7e740348 slot 8 hash 29 >>>>>> | phandle 7e740410 slot 10 hash 2 >>>>>> | phandle 7e740eb0 slot 30 hash 3e >>>>>> | phandle 7e745390 slot 10 hash 33 >>>>>> | phandle 7e747b08 slot 8 hash c >>>>>> | phandle 7e748528 slot 28 hash f >>>>>> | phandle 7e74a6e0 slot 20 hash 18 >>>>>> | phandle 7e74aab0 slot 30 hash b >>>>>> | phandle 7e74f788 slot 8 hash d >>>>>> | Used entries: 8, hashed: 29 >>>>>> >>>>>> So the hash array has 64 entries out which only 8 are populated. Using >>>>>> hash_32() populates 29 entries. >>>>>> Could someone with real hardware verify this? >>>>>> I'm not sure how important this performance wise, it looks just like a >>>>>> waste using only 1/8 of the array. >>>>> >>>>> The hash used is based on the assumptions you noted, and as stated in the >>>>> code, that phandle property values are in a contiguous range of 1..n >>>>> (not starting from zero), which is what dtc generates. >>>> >>>>> We knew that for systems that do not match the assumptions that the hash >>>>> will not be optimal. >>>> >>>> If we're going to have the phandle cache it should at least make some >>>> attempt to work across the systems that Linux supports. >>>> >>>>> Unless there is a serious performance problem for >>>>> such systems, I do not want to make the phandle hash code more complicated >>>>> to optimize for these cases. And the pseries have been performing ok >>>>> without phandle related performance issues that I remember hearing since >>>>> before the cache wa
Re: [RFC] Efficiency of the phandle_cache on ppc64/SLOF
On 12/6/19 5:40 PM, Segher Boessenkool wrote: > Hi, > > On Thu, Dec 05, 2019 at 07:37:24PM -0600, Frank Rowand wrote: >> On 12/3/19 12:35 PM, Segher Boessenkool wrote: >>> Btw. Some OFs mangle the phandles some way, to make it easier to catch >>> people using it as an address (and similarly, mangle ihandles differently, >>> so you catch confusion between ihandles and phandles as well). Like a >>> simple xor, with some odd number preferably. You should assume *nothing* >>> about phandles, they are opaque identifiers. >> >> For arm32 machines that use dtc to generate the devicetree, which is a >> very large user base, we certainly can make assumptions about phandles. > > I was talking about OF. Phandles are explicitly defined to be opaque > tokens. If there is an extra meaning to them in flattened device trees, > well, the kernel should then only depend on that there, not for more > general phandles. Where is this documented btw? And dtc generated devicetrees are a huge proportion of the OF systems. It is not documented. As an aside, overlays also depend upon the current dtc implementation. If an extremely large value is used for a phandle then overlay application will fail. > >> Especially because the complaints about the overhead of phandle based >> lookups have been voiced by users of this specific set of machines. >> >> For systems with a devicetree that does not follow the assumptions, the >> phandle cache should not measurably increase the overhead of phandle >> based lookups. > > It's an extra memory access and extra code to execute, for not much gain > (if anything). While with a reasonable hash function it will be good > for everyone. > >> If you have measurements of a system where implementing the phandle >> cache increased the overhead, > > Are you seriously saying you think this code can run in zero time and > space on most systems? No. I made no such claim. Note the additional words in the following sentences. >> and the additional overhead is a concern >> (such as significantly increasing boot time) then please share that >> information with us. Otherwise this is just a theoretical exercise. > > The point is that this code could be easily beneficial for most (or all) > users, not just those that use dtc-constructed device trees. It is The point is that the cache was implemented to solve a specific problem for certain specific systems. There had been a few reports of various machines having the same issue, but finally someone measures a **significant** improvement in boot time for a specific machine. The boot time with the cache was **measured** to be much shorter. The boot time for all systems with a dtc generated devicetree is expected to be faster. No one responded to the implementation when it was proposed with a **measurement** that showed increased boot time. A concern of using more memory was raised and discussed, with at least on feature added as a result (freeing the cache in late init if modules are not enabled). Being "beneficial for most (or all) users" has to be balanced against whether the change would remove the benefit for the system that the feature was originally implemented to solve a problem for. There was no performance data supplied to answer this question. (Though eventually Rob did some measurements of the impact on hash efficiency for such a system.) > completely obvious that having a worse cache hash function results in > many more lookups. Whether that results in something expressed as > milliseconds on tiny systems or microseconds on bigger systems is > completely beside the point. There was no performance data accompanying the proposed change that started this thread. There was no data showing whether the systems that this feature was created for would suffer. There was no data showing that the boot time of the pseries systems would improve. There was no assertion made that too much memory was being used by the cache (there was an implied assertion that a large percentage of the memory used for the cache was not used, thus the performance benefit of the cache could be improved by changing to using a hash instead of mask). We had rejected creating a cache for several years until finally some solid data was provided showing an actual need for it. It is not a question of "milliseconds on tiny systems or microseconds on bigger systems". I agree with that. But it does matter whether the performance impact of various implementations is large enough to either solve a problem or create a problem. On the other hand, if the amount of memory used by the cache is a problem (which is _not_ what was asserted by the patch submitter) then we can have a conversation about how to resolve that. -Frank > > > Segher >
Re: [RFC] Efficiency of the phandle_cache on ppc64/SLOF
On 12/5/19 10:35 AM, Sebastian Andrzej Siewior wrote: > On 2019-12-03 10:56:35 [-0600], Rob Herring wrote: >>> Another possibility would be to make the cache be dependent >>> upon not CONFIG_PPC. It might be possible to disable the >>> cache with a minimal code change. >> >> I'd rather not do that. >> >> And yes, as mentioned earlier I don't like the complexity. I didn't >> from the start and I'm I'm still of the opinion we should have a >> fixed or 1 time sized true cache (i.e. smaller than total # of >> phandles). That would solve the RT memory allocation and locking issue >> too. >> >> For reference, the performance difference between the current >> implementation (assuming fixes haven't regressed it) was ~400ms vs. a >> ~340ms improvement with a 64 entry cache (using a mask, not a hash). >> IMO, 340ms improvement was good enough. > > Okay. So the 814 phandles would result in an array with 1024 slots. That > would need 4KiB of memory. Is this amount of memory an issue for this system? If module support is not configured into the kernel then the cache is removed and memory freed in a late initcall. I don't know if that helps your use case or not. > What about we go back to the fix 64 slots array but with hash32 for the > lookup? Without the hash we would be 60ms slower during boot (compared > to now, based on ancient data) but then the hash isn't expensive so we > end up with better coverage of the memory on systems which don't have a > plain enumeration of the phandle. That performance data is specific to one particular system. It does not generalize to all devicetree based systems. So don't draw too many conclusions from it. If you want to understand the boot performance impact for your system, you need to measure the alternatives on your system. Is there a memory usage issue for the systems that led to this thread? Unless there is a documented memory issue, I do not want to expand an issue with poor cache bucket percent utilization to the other issue of cache size. -Frank > >> Rob > > Sebastian >
Re: [RFC] Efficiency of the phandle_cache on ppc64/SLOF
On 12/3/19 10:56 AM, Rob Herring wrote: > On Mon, Dec 2, 2019 at 10:28 PM Frank Rowand wrote: >> >> On 12/2/19 10:12 PM, Michael Ellerman wrote: >>> Frank Rowand writes: >>>> On 11/29/19 9:10 AM, Sebastian Andrzej Siewior wrote: >>>>> I've been looking at phandle_cache and noticed the following: The raw >>>>> phandle value as generated by dtc starts at zero and is incremented by >>>>> one for each phandle entry. The qemu pSeries model is using Slof (which >>>>> is probably the same thing as used on real hardware) and this looks like >>>>> a poiner value for the phandle. >>>>> With >>>>> qemu-system-ppc64le -m 16G -machine pseries -smp 8 >>>>> >>>>> I got the following output: >>>>> | entries: 64 >>>>> | phandle 7e732468 slot 28 hash c >>>>> | phandle 7e732ad0 slot 10 hash 27 >>>>> | phandle 7e732ee8 slot 28 hash 3a >>>>> | phandle 7e734160 slot 20 hash 36 >>>>> | phandle 7e734318 slot 18 hash 3a >>>>> | phandle 7e734428 slot 28 hash 33 >>>>> | phandle 7e734538 slot 38 hash 2c >>>>> | phandle 7e734850 slot 10 hash e >>>>> | phandle 7e735220 slot 20 hash 2d >>>>> | phandle 7e735bf0 slot 30 hash d >>>>> | phandle 7e7365c0 slot 0 hash 2d >>>>> | phandle 7e736f90 slot 10 hash d >>>>> | phandle 7e737960 slot 20 hash 2d >>>>> | phandle 7e738330 slot 30 hash d >>>>> | phandle 7e738d00 slot 0 hash 2d >>>>> | phandle 7e739730 slot 30 hash 38 >>>>> | phandle 7e73bd08 slot 8 hash 17 >>>>> | phandle 7e73c2e0 slot 20 hash 32 >>>>> | phandle 7e73c7f8 slot 38 hash 37 >>>>> | phandle 7e782420 slot 20 hash 13 >>>>> | phandle 7e782ed8 slot 18 hash 1b >>>>> | phandle 7e73ce28 slot 28 hash 39 >>>>> | phandle 7e73d390 slot 10 hash 22 >>>>> | phandle 7e73d9a8 slot 28 hash 1a >>>>> | phandle 7e73dc28 slot 28 hash 37 >>>>> | phandle 7e73de00 slot 0 hash a >>>>> | phandle 7e73e028 slot 28 hash 0 >>>>> | phandle 7e7621a8 slot 28 hash 36 >>>>> | phandle 7e73e458 slot 18 hash 1e >>>>> | phandle 7e73e608 slot 8 hash 1e >>>>> | phandle 7e740078 slot 38 hash 28 >>>>> | phandle 7e740180 slot 0 hash 1d >>>>> | phandle 7e740240 slot 0 hash 33 >>>>> | phandle 7e740348 slot 8 hash 29 >>>>> | phandle 7e740410 slot 10 hash 2 >>>>> | phandle 7e740eb0 slot 30 hash 3e >>>>> | phandle 7e745390 slot 10 hash 33 >>>>> | phandle 7e747b08 slot 8 hash c >>>>> | phandle 7e748528 slot 28 hash f >>>>> | phandle 7e74a6e0 slot 20 hash 18 >>>>> | phandle 7e74aab0 slot 30 hash b >>>>> | phandle 7e74f788 slot 8 hash d >>>>> | Used entries: 8, hashed: 29 >>>>> >>>>> So the hash array has 64 entries out which only 8 are populated. Using >>>>> hash_32() populates 29 entries. >>>>> Could someone with real hardware verify this? >>>>> I'm not sure how important this performance wise, it looks just like a >>>>> waste using only 1/8 of the array. >>>> >>>> The hash used is based on the assumptions you noted, and as stated in the >>>> code, that phandle property values are in a contiguous range of 1..n >>>> (not starting from zero), which is what dtc generates. >>> >>>> We knew that for systems that do not match the assumptions that the hash >>>> will not be optimal. >>> >>> If we're going to have the phandle cache it should at least make some >>> attempt to work across the systems that Linux supports. >>> >>>> Unless there is a serious performance problem for >>>> such systems, I do not want to make the phandle hash code more complicated >>>> to optimize for these cases. And the pseries have been performing ok >>>> without phandle related performance issues that I remember hearing since >>>> before the cache was added, which could have only helped the performance. >>>> Yes, if your observations are correct, some memory is being wasted, but >>>> a 64 entry cache is not very large on a pseries. >>> >>> A single line change to use an actual hash function is hardly >>> complicating it, compared to the amount of code alr
Re: [RFC] Efficiency of the phandle_cache on ppc64/SLOF
On 12/3/19 12:35 PM, Segher Boessenkool wrote: > Hi! > > On Tue, Dec 03, 2019 at 03:03:22PM +1100, Michael Ellerman wrote: >> Sebastian Andrzej Siewior writes: >> I've certainly heard it said that on some OF's the phandle was just == >> the address of the internal representation, and I guess maybe for SLOF >> that is true. > > It is (or was). In many OFs it is just the effective address of some > node structure. SLOF runs with translation off normally. > >> They seem to vary wildly though, eg. on an Apple G5: > > Apple OF runs with translation on usually. IIRC these are effective > addresses as well. > > The OF they have on G5 machines is mostly 32-bit, for compatibility is my > guess (for userland things dealing with addresses from OF, importantly). > >> $ find /proc/device-tree/ -name phandle | xargs lsprop | head -10 >> /proc/device-tree/vsp@0,f900/veo@f918/phandle ff970848 >> /proc/device-tree/vsp@0,f900/phandle ff970360 >> /proc/device-tree/vsp@0,f900/veo@f908/phandle ff970730 >> /proc/device-tree/nvram@0,fff04000/phandle ff967fb8 >> /proc/device-tree/xmodem/phandle ff9655e8 >> /proc/device-tree/multiboot/phandle ff9504f0 >> /proc/device-tree/diagnostics/phandle ff965550 >> /proc/device-tree/options/phandle ff893cf0 >> /proc/device-tree/openprom/client-services/phandle ff8925b8 >> /proc/device-tree/openprom/phandle ff892458 >> >> That machine does not have enough RAM for those to be 32-bit real >> addresses. I think Apple OF is running in virtual mode though (?), so >> maybe they are pointers? > > Yes, I think the default is to have 8MB ram at the top of 4GB (which is > the physical address of the bootrom, btw) for OF. > >> And on an IBM pseries machine they're a bit all over the place: >> >> /proc/device-tree/cpus/PowerPC,POWER8@40/ibm,phandle 1040 >> /proc/device-tree/cpus/l2-cache@2005/ibm,phandle 2005 >> /proc/device-tree/cpus/PowerPC,POWER8@30/ibm,phandle 1030 >> /proc/device-tree/cpus/PowerPC,POWER8@20/ibm,phandle 1020 >> /proc/device-tree/cpus/PowerPC,POWER8@10/ibm,phandle 1010 >> /proc/device-tree/cpus/l2-cache@2003/ibm,phandle 2003 >> /proc/device-tree/cpus/l2-cache@200a/ibm,phandle 200a >> /proc/device-tree/cpus/l3-cache@3108/ibm,phandle 3108 >> /proc/device-tree/cpus/l2-cache@2001/ibm,phandle 2001 >> /proc/device-tree/cpus/l3-cache@3106/ibm,phandle 3106 >> /proc/device-tree/cpus/ibm,phandle fff8 >> /proc/device-tree/cpus/l3-cache@3104/ibm,phandle 3104 >> /proc/device-tree/cpus/l2-cache@2008/ibm,phandle 2008 >> /proc/device-tree/cpus/l3-cache@3102/ibm,phandle 3102 >> /proc/device-tree/cpus/l2-cache@2006/ibm,phandle 2006 >> /proc/device-tree/cpus/l3-cache@3100/ibm,phandle 3100 >> /proc/device-tree/cpus/PowerPC,POWER8@8/ibm,phandle 1008 >> /proc/device-tree/cpus/l2-cache@2004/ibm,phandle 2004 >> /proc/device-tree/cpus/PowerPC,POWER8@48/ibm,phandle 1048 >> /proc/device-tree/cpus/PowerPC,POWER8@38/ibm,phandle 1038 >> /proc/device-tree/cpus/l2-cache@2002/ibm,phandle 2002 >> /proc/device-tree/cpus/PowerPC,POWER8@28/ibm,phandle 1028 >> /proc/device-tree/cpus/l3-cache@3107/ibm,phandle 3107 >> /proc/device-tree/cpus/PowerPC,POWER8@18/ibm,phandle 1018 >> /proc/device-tree/cpus/l2-cache@2000/ibm,phandle 2000 >> /proc/device-tree/cpus/l3-cache@3105/ibm,phandle 3105 >> /proc/device-tree/cpus/l3-cache@3103/ibm,phandle 3103 >> /proc/device-tree/cpus/l3-cache@310a/ibm,phandle 310a >> /proc/device-tree/cpus/PowerPC,POWER8@0/ibm,phandle 1000 >> /proc/device-tree/cpus/l2-cache@2007/ibm,phandle 2007 >> /proc/device-tree/cpus/l3-cache@3101/ibm,phandle 3101 >> /proc/device-tree/pci@800201b/ibm,phandle 201b > > Some (the 1000) look like addresses as well. > >>> So the hash array has 64 entries out which only 8 are populated. Using >>> hash_32() populates 29 entries. > >> On the G5 it's similarly inefficient: >> [0.007379] OF: of_populate_phandle_cache(242) Used entries: 31, hashed: >> 111 > >> And some output from a "real" pseries machine (IBM OF), which is >> slightly better: >> [0.129467] OF: of_populate_phandle_cache(242) Used entries: 39, hashed: >> 81 > >> So yeah using hash_32() is quite a bit better in both cases. > > Yup, no surprise there. And hash_32 is very cheap to compute. > >> And if I'm reading your patch right it would be a single line change to>> >> switch, so that seems like it's worth doing to me. > > Agreed! > > Btw. Some OFs mangle the phandles some way, to make it easier to catch > people using it as an address (and similarly, mangle ihandles differently, > so you catch confusion between ihandles and phandles as well). Like a > simple xor, with some odd number preferably. You should assume *nothing* > about phandles, they are opaque identifiers. For arm32 machines that use dtc to generate the devicetree, which is a
Re: [RFC] Efficiency of the phandle_cache on ppc64/SLOF
On 12/2/19 10:12 PM, Michael Ellerman wrote: > Frank Rowand writes: >> On 11/29/19 9:10 AM, Sebastian Andrzej Siewior wrote: >>> I've been looking at phandle_cache and noticed the following: The raw >>> phandle value as generated by dtc starts at zero and is incremented by >>> one for each phandle entry. The qemu pSeries model is using Slof (which >>> is probably the same thing as used on real hardware) and this looks like >>> a poiner value for the phandle. >>> With >>> qemu-system-ppc64le -m 16G -machine pseries -smp 8 >>> >>> I got the following output: >>> | entries: 64 >>> | phandle 7e732468 slot 28 hash c >>> | phandle 7e732ad0 slot 10 hash 27 >>> | phandle 7e732ee8 slot 28 hash 3a >>> | phandle 7e734160 slot 20 hash 36 >>> | phandle 7e734318 slot 18 hash 3a >>> | phandle 7e734428 slot 28 hash 33 >>> | phandle 7e734538 slot 38 hash 2c >>> | phandle 7e734850 slot 10 hash e >>> | phandle 7e735220 slot 20 hash 2d >>> | phandle 7e735bf0 slot 30 hash d >>> | phandle 7e7365c0 slot 0 hash 2d >>> | phandle 7e736f90 slot 10 hash d >>> | phandle 7e737960 slot 20 hash 2d >>> | phandle 7e738330 slot 30 hash d >>> | phandle 7e738d00 slot 0 hash 2d >>> | phandle 7e739730 slot 30 hash 38 >>> | phandle 7e73bd08 slot 8 hash 17 >>> | phandle 7e73c2e0 slot 20 hash 32 >>> | phandle 7e73c7f8 slot 38 hash 37 >>> | phandle 7e782420 slot 20 hash 13 >>> | phandle 7e782ed8 slot 18 hash 1b >>> | phandle 7e73ce28 slot 28 hash 39 >>> | phandle 7e73d390 slot 10 hash 22 >>> | phandle 7e73d9a8 slot 28 hash 1a >>> | phandle 7e73dc28 slot 28 hash 37 >>> | phandle 7e73de00 slot 0 hash a >>> | phandle 7e73e028 slot 28 hash 0 >>> | phandle 7e7621a8 slot 28 hash 36 >>> | phandle 7e73e458 slot 18 hash 1e >>> | phandle 7e73e608 slot 8 hash 1e >>> | phandle 7e740078 slot 38 hash 28 >>> | phandle 7e740180 slot 0 hash 1d >>> | phandle 7e740240 slot 0 hash 33 >>> | phandle 7e740348 slot 8 hash 29 >>> | phandle 7e740410 slot 10 hash 2 >>> | phandle 7e740eb0 slot 30 hash 3e >>> | phandle 7e745390 slot 10 hash 33 >>> | phandle 7e747b08 slot 8 hash c >>> | phandle 7e748528 slot 28 hash f >>> | phandle 7e74a6e0 slot 20 hash 18 >>> | phandle 7e74aab0 slot 30 hash b >>> | phandle 7e74f788 slot 8 hash d >>> | Used entries: 8, hashed: 29 >>> >>> So the hash array has 64 entries out which only 8 are populated. Using >>> hash_32() populates 29 entries. >>> Could someone with real hardware verify this? >>> I'm not sure how important this performance wise, it looks just like a >>> waste using only 1/8 of the array. >> >> The hash used is based on the assumptions you noted, and as stated in the >> code, that phandle property values are in a contiguous range of 1..n >> (not starting from zero), which is what dtc generates. > >> We knew that for systems that do not match the assumptions that the hash >> will not be optimal. > > If we're going to have the phandle cache it should at least make some > attempt to work across the systems that Linux supports. > >> Unless there is a serious performance problem for >> such systems, I do not want to make the phandle hash code more complicated >> to optimize for these cases. And the pseries have been performing ok >> without phandle related performance issues that I remember hearing since >> before the cache was added, which could have only helped the performance. >> Yes, if your observations are correct, some memory is being wasted, but >> a 64 entry cache is not very large on a pseries. > > A single line change to use an actual hash function is hardly > complicating it, compared to the amount of code already there. With a dtc generated devicetree, the hash is perfect, with no misses. That single line change then makes the hash bad for dtc generated devicetrees. The cache was added to address a problem with a system with a dtc generated devicetree. I had not heard of any phandle lookup performance issues on ppc systems. An imperfect hash for ppc should not make the ppc performance worse (unless every single phandle value hashes to a single bucket). So the ppc performance is likely better than it was before the hash was added, even without an optimal hash algorithm. So the change would not be a single line change. It would be a change to use different hash algorithms for dtc generated device trees versus other device trees. Another possibility would be to make the cache be dependent upon not CONFIG_PPC. It might be possible to disable the cache with a minimal code change. > > cheers >
Re: [RFC] Efficiency of the phandle_cache on ppc64/SLOF
On 11/29/19 9:10 AM, Sebastian Andrzej Siewior wrote: > I've been looking at phandle_cache and noticed the following: The raw > phandle value as generated by dtc starts at zero and is incremented by > one for each phandle entry. The qemu pSeries model is using Slof (which > is probably the same thing as used on real hardware) and this looks like > a poiner value for the phandle. > With > qemu-system-ppc64le -m 16G -machine pseries -smp 8 > > I got the following output: > | entries: 64 > | phandle 7e732468 slot 28 hash c > | phandle 7e732ad0 slot 10 hash 27 > | phandle 7e732ee8 slot 28 hash 3a > | phandle 7e734160 slot 20 hash 36 > | phandle 7e734318 slot 18 hash 3a > | phandle 7e734428 slot 28 hash 33 > | phandle 7e734538 slot 38 hash 2c > | phandle 7e734850 slot 10 hash e > | phandle 7e735220 slot 20 hash 2d > | phandle 7e735bf0 slot 30 hash d > | phandle 7e7365c0 slot 0 hash 2d > | phandle 7e736f90 slot 10 hash d > | phandle 7e737960 slot 20 hash 2d > | phandle 7e738330 slot 30 hash d > | phandle 7e738d00 slot 0 hash 2d > | phandle 7e739730 slot 30 hash 38 > | phandle 7e73bd08 slot 8 hash 17 > | phandle 7e73c2e0 slot 20 hash 32 > | phandle 7e73c7f8 slot 38 hash 37 > | phandle 7e782420 slot 20 hash 13 > | phandle 7e782ed8 slot 18 hash 1b > | phandle 7e73ce28 slot 28 hash 39 > | phandle 7e73d390 slot 10 hash 22 > | phandle 7e73d9a8 slot 28 hash 1a > | phandle 7e73dc28 slot 28 hash 37 > | phandle 7e73de00 slot 0 hash a > | phandle 7e73e028 slot 28 hash 0 > | phandle 7e7621a8 slot 28 hash 36 > | phandle 7e73e458 slot 18 hash 1e > | phandle 7e73e608 slot 8 hash 1e > | phandle 7e740078 slot 38 hash 28 > | phandle 7e740180 slot 0 hash 1d > | phandle 7e740240 slot 0 hash 33 > | phandle 7e740348 slot 8 hash 29 > | phandle 7e740410 slot 10 hash 2 > | phandle 7e740eb0 slot 30 hash 3e > | phandle 7e745390 slot 10 hash 33 > | phandle 7e747b08 slot 8 hash c > | phandle 7e748528 slot 28 hash f > | phandle 7e74a6e0 slot 20 hash 18 > | phandle 7e74aab0 slot 30 hash b > | phandle 7e74f788 slot 8 hash d > | Used entries: 8, hashed: 29 > > So the hash array has 64 entries out which only 8 are populated. Using > hash_32() populates 29 entries. > Could someone with real hardware verify this? > I'm not sure how important this performance wise, it looks just like a > waste using only 1/8 of the array. The hash used is based on the assumptions you noted, and as stated in the code, that phandle property values are in a contiguous range of 1..n (not starting from zero), which is what dtc generates. We knew that for systems that do not match the assumptions that the hash will not be optimal. Unless there is a serious performance problem for such systems, I do not want to make the phandle hash code more complicated to optimize for these cases. And the pseries have been performing ok without phandle related performance issues that I remember hearing since before the cache was added, which could have only helped the performance. Yes, if your observations are correct, some memory is being wasted, but a 64 entry cache is not very large on a pseries. There is already some push back from Rob that the existing code is more complex than needed (eg variable cache size). -Frank > > The patch used for testing: > > diff --git a/drivers/of/base.c b/drivers/of/base.c > index 1d667eb730e19..2640d4bc81a9a 100644 > --- a/drivers/of/base.c > +++ b/drivers/of/base.c > @@ -197,6 +197,7 @@ void of_populate_phandle_cache(void) > u32 cache_entries; > struct device_node *np; > u32 phandles = 0; > + struct device_node **cache2; > > raw_spin_lock_irqsave(&devtree_lock, flags); > > @@ -214,14 +215,32 @@ void of_populate_phandle_cache(void) > > phandle_cache = kcalloc(cache_entries, sizeof(*phandle_cache), > GFP_ATOMIC); > + cache2 = kcalloc(cache_entries, sizeof(*phandle_cache), GFP_ATOMIC); > if (!phandle_cache) > goto out; > > + pr_err("%s(%d) entries: %d\n", __func__, __LINE__, cache_entries); > for_each_of_allnodes(np) > if (np->phandle && np->phandle != OF_PHANDLE_ILLEGAL) { > + int slot; > of_node_get(np); > phandle_cache[np->phandle & phandle_cache_mask] = np; > + slot = hash_32(np->phandle, __ffs(cache_entries)); > + cache2[slot] = np; > + pr_err("%s(%d) phandle %x slot %x hash %x\n", __func__, > __LINE__, > +np->phandle, np->phandle & phandle_cache_mask, > slot); > } > + { > + int i, filled = 0, filled_hash = 0; > + > + for (i = 0; i < cache_entries; i++) { > + if (phandle_cache[i]) > + filled++; > + if (cache2[i]) > + filled_hash++; > + } > + pr_err("%s(%d) Used entries: %d, hashed: %d\n", __func__, > _
Re: [PATCH v2 2/2] of: __of_detach_node() - remove node from phandle cache
On 12/18/18 12:09 PM, Frank Rowand wrote: > On 12/18/18 12:01 PM, Rob Herring wrote: >> On Tue, Dec 18, 2018 at 12:57 PM Frank Rowand wrote: >>> >>> On 12/17/18 2:52 AM, Michael Ellerman wrote: >>>> Hi Frank, >>>> >>>> frowand.l...@gmail.com writes: >>>>> From: Frank Rowand >>>>> >>>>> Non-overlay dynamic devicetree node removal may leave the node in >>>>> the phandle cache. Subsequent calls to of_find_node_by_phandle() >>>>> will incorrectly find the stale entry. Remove the node from the >>>>> cache. >>>>> >>>>> Add paranoia checks in of_find_node_by_phandle() as a second level >>>>> of defense (do not return cached node if detached, do not add node >>>>> to cache if detached). >>>>> >>>>> Reported-by: Michael Bringmann >>>>> Signed-off-by: Frank Rowand >>>>> --- >>>> >>>> Similarly here can we add: >>>> >>>> Fixes: 0b3ce78e90fc ("of: cache phandle nodes to reduce cost of >>>> of_find_node_by_phandle()") >>> >>> Yes, thanks. >>> >>> >>>> Cc: sta...@vger.kernel.org # v4.17+ >>> >>> Nope, 0b3ce78e90fc does not belong in stable (it is a feature, not a bug >>> fix). So the bug will not be in stable. >> >> 0b3ce78e90fc landed in v4.17, so Michael's line above is correct. >> Annotating it with 4.17 only saves Greg from trying and then emailing >> us to backport this patch as it wouldn't apply. > > Thanks for the correction. I was both under-thinking and over-thinking, > ending up with an incorrect answer. > > Can you add the Cc: to version 3 patch comments (both 1/2 and 2/2) or do > you want me to re-spin? Now that my thinking has been straightened out, a little bit more checking for the other pre-requisite patches show: v4.18: commit b9952b5218ad ("of: overlay: update phandle cache on overlay apply and remove") v4.19: commit e54192b48da7 ("of: fix phandle cache creation for DTs with no phandles") These can be addressed by changing the "Cc:" to ... # v4.19+ because stable v4.17.* and v4.18.* are end of life. Or the pre-requisites can be listed: # v4.17: b9952b5218ad of: overlay: update phandle cache # v4.17: e54192b48da7 of: fix phandle cache creation # v4.17 # v4.18: e54192b48da7 of: fix phandle cache creation # v4.18 # v4.19+ Do you have a preference? -Frank > > -Frank > >> >> Rob >> > >
Re: [PATCH v2 2/2] of: __of_detach_node() - remove node from phandle cache
On 12/18/18 12:01 PM, Rob Herring wrote: > On Tue, Dec 18, 2018 at 12:57 PM Frank Rowand wrote: >> >> On 12/17/18 2:52 AM, Michael Ellerman wrote: >>> Hi Frank, >>> >>> frowand.l...@gmail.com writes: >>>> From: Frank Rowand >>>> >>>> Non-overlay dynamic devicetree node removal may leave the node in >>>> the phandle cache. Subsequent calls to of_find_node_by_phandle() >>>> will incorrectly find the stale entry. Remove the node from the >>>> cache. >>>> >>>> Add paranoia checks in of_find_node_by_phandle() as a second level >>>> of defense (do not return cached node if detached, do not add node >>>> to cache if detached). >>>> >>>> Reported-by: Michael Bringmann >>>> Signed-off-by: Frank Rowand >>>> --- >>> >>> Similarly here can we add: >>> >>> Fixes: 0b3ce78e90fc ("of: cache phandle nodes to reduce cost of >>> of_find_node_by_phandle()") >> >> Yes, thanks. >> >> >>> Cc: sta...@vger.kernel.org # v4.17+ >> >> Nope, 0b3ce78e90fc does not belong in stable (it is a feature, not a bug >> fix). So the bug will not be in stable. > > 0b3ce78e90fc landed in v4.17, so Michael's line above is correct. > Annotating it with 4.17 only saves Greg from trying and then emailing > us to backport this patch as it wouldn't apply. Thanks for the correction. I was both under-thinking and over-thinking, ending up with an incorrect answer. Can you add the Cc: to version 3 patch comments (both 1/2 and 2/2) or do you want me to re-spin? -Frank > > Rob >
Re: [PATCH v3 2/2] of: __of_detach_node() - remove node from phandle cache
On 12/18/18 11:40 AM, frowand.l...@gmail.com wrote: > From: Frank Rowand > > Non-overlay dynamic devicetree node removal may leave the node in > the phandle cache. Subsequent calls to of_find_node_by_phandle() > will incorrectly find the stale entry. Remove the node from the > cache. > > Add paranoia checks in of_find_node_by_phandle() as a second level > of defense (do not return cached node if detached, do not add node > to cache if detached). > > Fixes: 0b3ce78e90fc ("of: cache phandle nodes to reduce cost of > of_find_node_by_phandle()") > Reported-by: Michael Bringmann > Signed-off-by: Frank Rowand > --- > > do not "cc: stable", unless the following commits are also in stable: > > commit e54192b48da7 ("of: fix phandle cache creation for DTs with no > phandles") > commit b9952b5218ad ("of: overlay: update phandle cache on overlay apply > and remove") > commit 0b3ce78e90fc ("of: cache phandle nodes to reduce cost of > of_find_node_by_phandle()") > > > Changes since v2: > - add temporary variable np in __of_free_phandle_cache_entry() to improve > readability > - explain reason for WARN_ON() in comment > - add Fixes tag in patch comment I should have carried this forward: changes since v1: - add WARN_ON(1) for unexpected condition in of_find_node_by_phandle() -Frank > > drivers/of/base.c | 31 ++- > drivers/of/dynamic.c| 3 +++ > drivers/of/of_private.h | 4 > 3 files changed, 37 insertions(+), 1 deletion(-) > > diff --git a/drivers/of/base.c b/drivers/of/base.c > index 6c33d63361b8..6d20b6dcf034 100644 > --- a/drivers/of/base.c > +++ b/drivers/of/base.c > @@ -162,6 +162,28 @@ int of_free_phandle_cache(void) > late_initcall_sync(of_free_phandle_cache); > #endif > > +/* > + * Caller must hold devtree_lock. > + */ > +void __of_free_phandle_cache_entry(phandle handle) > +{ > + phandle masked_handle; > + struct device_node *np; > + > + if (!handle) > + return; > + > + masked_handle = handle & phandle_cache_mask; > + > + if (phandle_cache) { > + np = phandle_cache[masked_handle]; > + if (np && handle == np->phandle) { > + of_node_put(np); > + phandle_cache[masked_handle] = NULL; > + } > + } > +} > + > void of_populate_phandle_cache(void) > { > unsigned long flags; > @@ -1209,11 +1231,18 @@ struct device_node *of_find_node_by_phandle(phandle > handle) > if (phandle_cache[masked_handle] && > handle == phandle_cache[masked_handle]->phandle) > np = phandle_cache[masked_handle]; > + if (np && of_node_check_flag(np, OF_DETACHED)) { > + WARN_ON(1); /* did not uncache np on node removal */ > + of_node_put(np); > + phandle_cache[masked_handle] = NULL; > + np = NULL; > + } > } > > if (!np) { > for_each_of_allnodes(np) > - if (np->phandle == handle) { > + if (np->phandle == handle && > + !of_node_check_flag(np, OF_DETACHED)) { > if (phandle_cache) { > /* will put when removed from cache */ > of_node_get(np); > diff --git a/drivers/of/dynamic.c b/drivers/of/dynamic.c > index f4f8ed9b5454..ecea92f68c87 100644 > --- a/drivers/of/dynamic.c > +++ b/drivers/of/dynamic.c > @@ -268,6 +268,9 @@ void __of_detach_node(struct device_node *np) > } > > of_node_set_flag(np, OF_DETACHED); > + > + /* race with of_find_node_by_phandle() prevented by devtree_lock */ > + __of_free_phandle_cache_entry(np->phandle); > } > > /** > diff --git a/drivers/of/of_private.h b/drivers/of/of_private.h > index 5d1567025358..24786818e32e 100644 > --- a/drivers/of/of_private.h > +++ b/drivers/of/of_private.h > @@ -84,6 +84,10 @@ static inline void __of_detach_node_sysfs(struct > device_node *np) {} > int of_resolve_phandles(struct device_node *tree); > #endif > > +#if defined(CONFIG_OF_DYNAMIC) > +void __of_free_phandle_cache_entry(phandle handle); > +#endif > + > #if defined(CONFIG_OF_OVERLAY) > void of_overlay_mutex_lock(void); > void of_overlay_mutex_unlock(void); >
Re: [PATCH v3 1/2] of: of_node_get()/of_node_put() nodes held in phandle cache
On 12/18/18 11:40 AM, frowand.l...@gmail.com wrote: > From: Frank Rowand > > The phandle cache contains struct device_node pointers. The refcount > of the pointers was not incremented while in the cache, allowing use > after free error after kfree() of the node. Add the proper increment > and decrement of the use count. > > Fixes: 0b3ce78e90fc ("of: cache phandle nodes to reduce cost of > of_find_node_by_phandle()") > > Signed-off-by: Frank Rowand > --- > > do not "cc: stable", unless the following commits are also in stable: > > commit e54192b48da7 ("of: fix phandle cache creation for DTs with no > phandles") > commit b9952b5218ad ("of: overlay: update phandle cache on overlay apply > and remove") > commit 0b3ce78e90fc ("of: cache phandle nodes to reduce cost of > of_find_node_by_phandle()") I should have carried this forward: changes since v1 - make __of_free_phandle_cache() static -Frank > > > drivers/of/base.c | 70 > --- > 1 file changed, 46 insertions(+), 24 deletions(-) > > diff --git a/drivers/of/base.c b/drivers/of/base.c > index 09692c9b32a7..6c33d63361b8 100644 > --- a/drivers/of/base.c > +++ b/drivers/of/base.c > @@ -116,9 +116,6 @@ int __weak of_node_to_nid(struct device_node *np) > } > #endif > > -static struct device_node **phandle_cache; > -static u32 phandle_cache_mask; > - > /* > * Assumptions behind phandle_cache implementation: > * - phandle property values are in a contiguous range of 1..n > @@ -127,6 +124,44 @@ int __weak of_node_to_nid(struct device_node *np) > * - the phandle lookup overhead reduction provided by the cache > * will likely be less > */ > + > +static struct device_node **phandle_cache; > +static u32 phandle_cache_mask; > + > +/* > + * Caller must hold devtree_lock. > + */ > +static void __of_free_phandle_cache(void) > +{ > + u32 cache_entries = phandle_cache_mask + 1; > + u32 k; > + > + if (!phandle_cache) > + return; > + > + for (k = 0; k < cache_entries; k++) > + of_node_put(phandle_cache[k]); > + > + kfree(phandle_cache); > + phandle_cache = NULL; > +} > + > +int of_free_phandle_cache(void) > +{ > + unsigned long flags; > + > + raw_spin_lock_irqsave(&devtree_lock, flags); > + > + __of_free_phandle_cache(); > + > + raw_spin_unlock_irqrestore(&devtree_lock, flags); > + > + return 0; > +} > +#if !defined(CONFIG_MODULES) > +late_initcall_sync(of_free_phandle_cache); > +#endif > + > void of_populate_phandle_cache(void) > { > unsigned long flags; > @@ -136,8 +171,7 @@ void of_populate_phandle_cache(void) > > raw_spin_lock_irqsave(&devtree_lock, flags); > > - kfree(phandle_cache); > - phandle_cache = NULL; > + __of_free_phandle_cache(); > > for_each_of_allnodes(np) > if (np->phandle && np->phandle != OF_PHANDLE_ILLEGAL) > @@ -155,30 +189,15 @@ void of_populate_phandle_cache(void) > goto out; > > for_each_of_allnodes(np) > - if (np->phandle && np->phandle != OF_PHANDLE_ILLEGAL) > + if (np->phandle && np->phandle != OF_PHANDLE_ILLEGAL) { > + of_node_get(np); > phandle_cache[np->phandle & phandle_cache_mask] = np; > + } > > out: > raw_spin_unlock_irqrestore(&devtree_lock, flags); > } > > -int of_free_phandle_cache(void) > -{ > - unsigned long flags; > - > - raw_spin_lock_irqsave(&devtree_lock, flags); > - > - kfree(phandle_cache); > - phandle_cache = NULL; > - > - raw_spin_unlock_irqrestore(&devtree_lock, flags); > - > - return 0; > -} > -#if !defined(CONFIG_MODULES) > -late_initcall_sync(of_free_phandle_cache); > -#endif > - > void __init of_core_init(void) > { > struct device_node *np; > @@ -1195,8 +1214,11 @@ struct device_node *of_find_node_by_phandle(phandle > handle) > if (!np) { > for_each_of_allnodes(np) > if (np->phandle == handle) { > - if (phandle_cache) > + if (phandle_cache) { > + /* will put when removed from cache */ > + of_node_get(np); > phandle_cache[masked_handle] = np; > + } > break; > } > } >
Re: [PATCH v2 2/2] of: __of_detach_node() - remove node from phandle cache
On 12/17/18 2:52 AM, Michael Ellerman wrote: > Hi Frank, > > frowand.l...@gmail.com writes: >> From: Frank Rowand >> >> Non-overlay dynamic devicetree node removal may leave the node in >> the phandle cache. Subsequent calls to of_find_node_by_phandle() >> will incorrectly find the stale entry. Remove the node from the >> cache. >> >> Add paranoia checks in of_find_node_by_phandle() as a second level >> of defense (do not return cached node if detached, do not add node >> to cache if detached). >> >> Reported-by: Michael Bringmann >> Signed-off-by: Frank Rowand >> --- > > Similarly here can we add: > > Fixes: 0b3ce78e90fc ("of: cache phandle nodes to reduce cost of > of_find_node_by_phandle()") Yes, thanks. > Cc: sta...@vger.kernel.org # v4.17+ Nope, 0b3ce78e90fc does not belong in stable (it is a feature, not a bug fix). So the bug will not be in stable. I've debated with myself over this, because there is a possibility that 0b3ce78e90fc could somehow be put into a stable despite not being a bug fix. We can always explicitly request this patch series be added to stable in that case. > Thanks for doing this series. > > Some minor comments below. > >> diff --git a/drivers/of/base.c b/drivers/of/base.c >> index 6c33d63361b8..ad71864cecf5 100644 >> --- a/drivers/of/base.c >> +++ b/drivers/of/base.c >> @@ -162,6 +162,27 @@ int of_free_phandle_cache(void) >> late_initcall_sync(of_free_phandle_cache); >> #endif >> >> +/* >> + * Caller must hold devtree_lock. >> + */ >> +void __of_free_phandle_cache_entry(phandle handle) >> +{ >> +phandle masked_handle; >> + >> +if (!handle) >> +return; > > We could fold the phandle_cache check into that if and return early for > both cases couldn't we? We could, but that would make the reason for checking phandle_cache less obvious. I would rather leave that check > >> +masked_handle = handle & phandle_cache_mask; >> + >> +if (phandle_cache) { > > Meaning this wouldn't be necessary. > >> +if (phandle_cache[masked_handle] && >> +handle == phandle_cache[masked_handle]->phandle) { >> +of_node_put(phandle_cache[masked_handle]); >> +phandle_cache[masked_handle] = NULL; >> +} > > A temporary would help the readability here I think, eg: > > struct device_node *np; > np = phandle_cache[masked_handle]; > > if (np && handle == np->phandle) { > of_node_put(np); > phandle_cache[masked_handle] = NULL; > } Yes, much cleaner. >> @@ -1209,11 +1230,18 @@ struct device_node *of_find_node_by_phandle(phandle >> handle) >> if (phandle_cache[masked_handle] && >> handle == phandle_cache[masked_handle]->phandle) >> np = phandle_cache[masked_handle]; >> +if (np && of_node_check_flag(np, OF_DETACHED)) { >> +WARN_ON(1); >> +of_node_put(np); > > Do we really want to do the put here? > > We're here because something has gone wrong, possibly even memory > corruption such that np is not even pointing at a device node anymore. > So it seems like it would be safer to just leave the ref count alone, > possibly leak a small amount of memory, and NULL out the reference. I like the concept of the code being a little bit paranoid. But the bug that this check is likely to cache is the bug that led to this series -- removing a devicetree node, but failing to remove it from the cache as part of the removal. So I think I'll leave it as is. > > > cheers > Thanks for the thoughts and suggestions! -Frank
Re: [PATCH 1/2] of: of_node_get()/of_node_put() nodes held in phandle cache
On 12/14/18 2:47 PM, Frank Rowand wrote: > On 12/14/18 9:15 AM, Rob Herring wrote: >> On Fri, Dec 14, 2018 at 12:43 AM wrote: >>> >>> From: Frank Rowand >>> >>> The phandle cache contains struct device_node pointers. The refcount >>> of the pointers was not incremented while in the cache, allowing use >>> after free error after kfree() of the node. Add the proper increment >>> and decrement of the use count. >> >> Since we pre-populate the cache at boot, all the nodes will have a ref >> count and will never be freed unless we happen to repopulate the whole >> cache. That doesn't seem ideal. The node pointer is not "in use" just >> because it is in the cache. I forgot to reply to this sentence. The node pointers are "in use" because of_find_node_by_phandle() will use the pointers to access the phandle field. This is a use after free bug if the node has been kfree()'ed. >> >> Rob >> > > This patch also adds of_node_put() so that the refcount will go to zero > when the node is removed as part of an overlay remove, if the node was > added by an overlay. > > Patch 2/2 adds the free cache entry call to __of_detach_node(), so the > refcount will go to zero when the node is removed for dynamic use cases > other than overlays. (For overlays, all nodes are instead removed from > the cache before __of_detach_node() is called.) > > -Frank >
Re: [PATCH 1/2] of: of_node_get()/of_node_put() nodes held in phandle cache
On 12/14/18 9:15 AM, Rob Herring wrote: > On Fri, Dec 14, 2018 at 12:43 AM wrote: >> >> From: Frank Rowand >> >> The phandle cache contains struct device_node pointers. The refcount >> of the pointers was not incremented while in the cache, allowing use >> after free error after kfree() of the node. Add the proper increment >> and decrement of the use count. > > Since we pre-populate the cache at boot, all the nodes will have a ref > count and will never be freed unless we happen to repopulate the whole > cache. That doesn't seem ideal. The node pointer is not "in use" just > because it is in the cache. > > Rob > This patch also adds of_node_put() so that the refcount will go to zero when the node is removed as part of an overlay remove, if the node was added by an overlay. Patch 2/2 adds the free cache entry call to __of_detach_node(), so the refcount will go to zero when the node is removed for dynamic use cases other than overlays. (For overlays, all nodes are instead removed from the cache before __of_detach_node() is called.) -Frank
Re: [PATCH 2/2] of: __of_detach_node() - remove node from phandle cache
On 12/14/18 1:56 PM, Michael Bringmann wrote: > On 12/14/2018 11:20 AM, Rob Herring wrote: >> On Fri, Dec 14, 2018 at 12:43 AM wrote: >>> >>> From: Frank Rowand >>> >>> Non-overlay dynamic devicetree node removal may leave the node in >>> the phandle cache. Subsequent calls to of_find_node_by_phandle() >>> will incorrectly find the stale entry. Remove the node from the >>> cache. >>> >>> Add paranoia checks in of_find_node_by_phandle() as a second level >>> of defense (do not return cached node if detached, do not add node >>> to cache if detached). >>> >>> Reported-by: Michael Bringmann >>> Signed-off-by: Frank Rowand >>> --- >>> drivers/of/base.c | 29 - >>> drivers/of/dynamic.c| 3 +++ >>> drivers/of/of_private.h | 4 >>> 3 files changed, 35 insertions(+), 1 deletion(-) >>> >>> diff --git a/drivers/of/base.c b/drivers/of/base.c >>> index d599367cb92a..34a5125713c8 100644 >>> --- a/drivers/of/base.c >>> +++ b/drivers/of/base.c >>> @@ -162,6 +162,27 @@ int of_free_phandle_cache(void) >>> late_initcall_sync(of_free_phandle_cache); >>> #endif >>> >>> +/* >>> + * Caller must hold devtree_lock. >>> + */ >>> +void __of_free_phandle_cache_entry(phandle handle) >>> +{ >>> + phandle masked_handle; >>> + >>> + if (!handle) >>> + return; >>> + >>> + masked_handle = handle & phandle_cache_mask; >>> + >>> + if (phandle_cache) { >>> + if (phandle_cache[masked_handle] && >>> + handle == phandle_cache[masked_handle]->phandle) { >>> + of_node_put(phandle_cache[masked_handle]); >>> + phandle_cache[masked_handle] = NULL; >>> + } >>> + } >>> +} >>> + >>> void of_populate_phandle_cache(void) >>> { >>> unsigned long flags; >>> @@ -1209,11 +1230,17 @@ struct device_node *of_find_node_by_phandle(phandle >>> handle) >>> if (phandle_cache[masked_handle] && >>> handle == phandle_cache[masked_handle]->phandle) >>> np = phandle_cache[masked_handle]; >>> + if (np && of_node_check_flag(np, OF_DETACHED)) { >>> + of_node_put(np); >>> + phandle_cache[masked_handle] = NULL; >> >> This should never happen, right? Any time we set OF_DETACHED, the >> entry should get removed from the cache. I think we want a WARN here >> in case we're in an unexpected state. Correct, this should never happen. I will add the WARN. > We don't actually remove the pointer from the phandle cache when we set > OF_DETACHED in drivers/of/dynamic.c:__of_detach_node. The phandle cache > is currently static within drivers/of/base.c. There are a couple of > calls to of_populate_phandle_cache / of_free_phandle_cache within > drivers/of/overlay.c, but these are not involved in the device tree > updates that occur during LPAR migration. A WARN here would only make > sense, if we also arrange to clear the handle. Rob's reply did not include the full patch 2/2. The full patch 2/2 also adds a call to __of_free_phandle_cache_entry() in __of_detach_node(). -Frank > >> >> Rob > > Michael > >> >> >
Re: [PATCH 0/2] of: phandle_cache, fix refcounts, remove stale entry
Hi Michael Bringmann, On 12/13/18 10:42 PM, frowand.l...@gmail.com wrote: > From: Frank Rowand > > Non-overlay dynamic devicetree node removal may leave the node in > the phandle cache. Subsequent calls to of_find_node_by_phandle() > will incorrectly find the stale entry. This bug exposed the foloowing > phandle cache refcount bug. > > The refcount of phandle_cache entries is not incremented while in > the cache, allowing use after free error after kfree() of the > cached entry. > > Frank Rowand (2): > of: of_node_get()/of_node_put() nodes held in phandle cache > of: __of_detach_node() - remove node from phandle cache > > drivers/of/base.c | 99 > - > drivers/of/dynamic.c| 3 ++ > drivers/of/of_private.h | 4 ++ > 3 files changed, 81 insertions(+), 25 deletions(-) > Can you please test that these patches fix the problem that you reported in: [PATCH v03] powerpc/mobility: Fix node detach/rename problem Thanks, Frank
Re: [PATCH v03] powerpc/mobility: Fix node detach/rename problem
Hi Michael Bringmann, On 12/11/18 8:07 AM, Rob Herring wrote: > On Tue, Dec 11, 2018 at 7:29 AM Michael Ellerman wrote: >> >> Hi Michael, >> >> Please Cc the device tree folks on device tree patches, and also the >> original author of the patch that added the code you're modifying. >> >> So I've added: >> robh...@kernel.org >> frowand.l...@gmail.com >> devicet...@vger.kernel.org >> linux-ker...@vger.kernel.org >> >> Michael Bringmann writes: >>> The PPC mobility code receives RTAS requests to delete nodes with >>> platform-/hardware-specific attributes when restarting the kernel >>> after a migration. My example is for migration between a P8 Alpine >>> and a P8 Brazos. Nodes to be deleted include 'ibm,random-v1', >>> 'ibm,platform-facilities', 'ibm,sym-encryption-v1', and, >>> 'ibm,compression-v1'. >>> >>> The mobility.c code calls 'of_detach_node' for the nodes and their >>> children. This makes calls to detach the properties and to remove >>> the associated sysfs/kernfs files. >>> >>> Then new copies of the same nodes are next provided by the PHYP, >>> local copies are built, and a pointer to the 'struct device_node' >>> is passed to of_attach_node. Before the call to of_attach_node, >>> the phandle is initialized to 0 when the data structure is alloced. >>> During the call to of_attach_node, it calls __of_attach_node which >>> pulls the actual name and phandle from just created sub-properties >>> named something like 'name' and 'ibm,phandle'. >>> >>> This is all fine for the first migration. The problem occurs with >>> the second and subsequent migrations when the PHYP on the new system >>> wants to replace the same set of nodes again, referenced with the >>> same names and phandle values. >>> >>> On the second and subsequent migrations, the PHYP tells the system >>> to again delete the nodes 'ibm,platform-facilities', 'ibm,random-v1', >>> 'ibm,compression-v1', 'ibm,sym-encryption-v1'. It specifies these >>> nodes by its known set of phandle values -- the same handles used >>> by the PHYP on the source system are known on the target system. >>> The mobility.c code calls of_find_node_by_phandle() with these values >>> and ends up locating the first instance of each node that was added >>> during the original boot, instead of the second instance of each node >>> created after the first migration. The detach during the second >>> migration fails with errors like, >>> >>> [ 4565.030704] WARNING: CPU: 3 PID: 4787 at drivers/of/dynamic.c:252 >>> __of_detach_node+0x8/0xa0 >>> [ 4565.030708] Modules linked in: nfsv3 nfs_acl nfs tcp_diag udp_diag >>> inet_diag unix_diag af_packet_diag netlink_diag lockd grace fscache sunrpc >>> xts vmx_crypto sg pseries_rng binfmt_misc ip_tables xfs libcrc32c sd_mod >>> ibmveth ibmvscsi scsi_transport_srp dm_mirror dm_region_hash dm_log dm_mod >>> [ 4565.030733] CPU: 3 PID: 4787 Comm: drmgr Tainted: GW >>> 4.18.0-rc1-wi107836-v05-120+ #201 >>> [ 4565.030737] NIP: c07c1ea8 LR: c07c1fb4 CTR: >>> 00655170 >>> [ 4565.030741] REGS: c003f302b690 TRAP: 0700 Tainted: GW >>> (4.18.0-rc1-wi107836-v05-120+) >>> [ 4565.030745] MSR: 80010282b033 >>> CR: 22288822 XER: 000a >>> [ 4565.030757] CFAR: c07c1fb0 IRQMASK: 1 >>> [ 4565.030757] GPR00: c07c1fa4 c003f302b910 c114bf00 >>> c0038e68 >>> [ 4565.030757] GPR04: 0001 80c008e0b4b8 >>> >>> [ 4565.030757] GPR08: 0001 8003 >>> 2843 >>> [ 4565.030757] GPR12: 8800 c0001ec9ae00 4000 >>> >>> [ 4565.030757] GPR16: 0008 >>> f6ff >>> [ 4565.030757] GPR20: 0007 c003e9f1f034 >>> 0001 >>> [ 4565.030757] GPR24: >>> >>> [ 4565.030757] GPR28: c1549d28 c1134828 c0038e68 >>> c003f302b930 >>> [ 4565.030804] NIP [c07c1ea8] __of_detach_node+0x8/0xa0 >>> [ 4565.030808] LR [c07c1fb4] of_detach_node+0x74/0xd0 >>> [ 4565.030811] Call Trace: >>> [ 4565.030815] [c003f302b910] [c07c1fa4] >>> of_detach_node+0x64/0xd0 (unreliable) >>> [ 4565.030821] [c003f302b980] [c00c33c4] >>> dlpar_detach_node+0xb4/0x150 >>> [ 4565.030826] [c003f302ba10] [c00c3ffc] >>> delete_dt_node+0x3c/0x80 >>> [ 4565.030831] [c003f302ba40] [c00c4380] >>> pseries_devicetree_update+0x150/0x4f0 >>> [ 4565.030836] [c003f302bb70] [c00c479c] >>> post_mobility_fixup+0x7c/0xf0 >>> [ 4565.030841] [c003f302bbe0] [c00c4908] >>> migration_store+0xf8/0x130 >>> [ 4565.030847] [c003f302bc70] [c0998160] >>> kobj_attr_store+0x30/0x60 >>> [ 4565.030852] [c003f302bc90] [c0412f14] >>> sysfs_kf_write+0x64/0xa0 >>> [ 45
Re: [PATCH] of: add dtc annotations functionality to dtx_diff
Hi Rob, I got the cc: list wrong on this patch, please dis-regard. I will resend (with unchanged version) to the correct cc: list. -Frank On 11/26/18 3:54 AM, frowand.l...@gmail.com wrote: > From: Frank Rowand > > Add -T and --annotations command line arguments to dtx_diff. These > arguments will be passed through to dtc. dtc will then add source > location annotations to its output. > > Signed-off-by: Frank Rowand > --- > > This feature depends upon commit 5667e7ef9a9a ("annotations: add the > annotation functionality") in the dtc git repository. To use the > new flags before the new version of dtc is imported to the linux > kernel, download the dtc repository, compile dtc with the make command, > then add the path of the dtc repository to the shell PATH variable. > > scripts/dtc/dtx_diff | 13 - > 1 file changed, 12 insertions(+), 1 deletion(-) > > diff --git a/scripts/dtc/dtx_diff b/scripts/dtc/dtx_diff > index 8c4fbad2055e..0d8572008729 100755 > --- a/scripts/dtc/dtx_diff > +++ b/scripts/dtc/dtx_diff > @@ -21,6 +21,7 @@ Usage: > diff DTx_1 and DTx_2 > > > + --annotatesynonym for -T > -f print full dts in diff (--unified=9) > -h synonym for --help > -helpsynonym for --help > @@ -28,6 +29,7 @@ Usage: > -s SRCTREE linux kernel source tree is at path SRCTREE > (default is current directory) > -S linux kernel source tree is at root of current git repo > + -T Annotate output .dts with input source file and line (-T > -T for more details) > -u unsorted, do not sort DTx > > > @@ -174,6 +176,7 @@ compile_to_dts() { > > # - start of script > > +annotate="" > cmd_diff=0 > diff_flags="-u" > dtx_file_1="" > @@ -208,6 +211,14 @@ while [ $# -gt 0 ] ; do > shift > ;; > > + -T | --annotate ) > + if [ "${annotate}" = "" ] ; then > + annotate="-T" > + elif [ "${annotate}" = "-T" ] ; then > + annotate="-T -T" > + fi > + shift > + ;; > -u ) > dtc_sort="" > shift > @@ -327,7 +338,7 @@ cpp_flags="\ > DTC="\ > ${DTC} \ > -i ${srctree}/scripts/dtc/include-prefixes \ > - -O dts -qq -f ${dtc_sort} -o -" > + -O dts -qq -f ${dtc_sort} ${annotate} -o -" > > > # - do the diff or decompile >
Re: [GIT PULL] of: overlay: validation checks, subsequent fixes for v20 -- correction: v4.20
On 11/8/18 10:56 PM, Frank Rowand wrote: > Hi Rob, > > Please pull the changes to add the overlay validation checks. > > This is the v7 version of the patch series. > > -Frank > > > The following changes since commit 651022382c7f8da46cb4872a545ee1da6d097d2a: > > Linux 4.20-rc1 (2018-11-04 15:37:52 -0800) > > are available in the git repository at: > > git://git.kernel.org/pub/scm/linux/kernel/git/frowand/linux.git > tags/kfree_validate_v7-for-4.20 > > for you to fetch changes up to eeb07c573ec307c53fe2f6ac6d8d11c261f64006: > > of: unittest: initialize args before calling of_*parse_*() (2018-11-08 > 22:12:37 -0800) > > > Add checks to (1) overlay apply process and (2) memory freeing > triggered by overlay release. The checks are intended to detect > possible memory leaks and invalid overlays. > > The checks revealed bugs in existing code. Fixed the bugs. > > While fixing bugs, noted other issues, which are fixed in > separate patches. > > > Frank Rowand (17): > of: overlay: add tests to validate kfrees from overlay removal > of: overlay: add missing of_node_put() after add new node to changeset > of: overlay: add missing of_node_get() in __of_attach_node_sysfs > powerpc/pseries: add of_node_put() in dlpar_detach_node() > of: overlay: use prop add changeset entry for property in new nodes > of: overlay: do not duplicate properties from overlay for new nodes > of: overlay: reorder fields in struct fragment > of: overlay: validate overlay properties #address-cells and #size-cells > of: overlay: make all pr_debug() and pr_err() messages unique > of: overlay: test case of two fragments adding same node > of: overlay: check prevents multiple fragments add or delete same node > of: overlay: check prevents multiple fragments touching same property > of: unittest: remove unused of_unittest_apply_overlay() argument > of: overlay: set node fields from properties when add new overlay node > of: unittest: allow base devicetree to have symbol metadata > of: unittest: find overlays[] entry by name instead of index > of: unittest: initialize args before calling of_*parse_*() > > arch/powerpc/platforms/pseries/dlpar.c | 2 + > drivers/of/dynamic.c | 59 - > drivers/of/kobj.c | 4 +- > drivers/of/overlay.c | 292 > - > drivers/of/unittest-data/Makefile | 2 + > .../of/unittest-data/overlay_bad_add_dup_node.dts | 28 ++ > .../of/unittest-data/overlay_bad_add_dup_prop.dts | 24 ++ > drivers/of/unittest-data/overlay_base.dts | 1 + > drivers/of/unittest.c | 96 +-- > include/linux/of.h | 21 +- > 10 files changed, 432 insertions(+), 97 deletions(-) > create mode 100644 drivers/of/unittest-data/overlay_bad_add_dup_node.dts > create mode 100644 drivers/of/unittest-data/overlay_bad_add_dup_prop.dts >
[GIT PULL] of: overlay: validation checks, subsequent fixes for v20
Hi Rob, Please pull the changes to add the overlay validation checks. This is the v7 version of the patch series. -Frank The following changes since commit 651022382c7f8da46cb4872a545ee1da6d097d2a: Linux 4.20-rc1 (2018-11-04 15:37:52 -0800) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/frowand/linux.git tags/kfree_validate_v7-for-4.20 for you to fetch changes up to eeb07c573ec307c53fe2f6ac6d8d11c261f64006: of: unittest: initialize args before calling of_*parse_*() (2018-11-08 22:12:37 -0800) Add checks to (1) overlay apply process and (2) memory freeing triggered by overlay release. The checks are intended to detect possible memory leaks and invalid overlays. The checks revealed bugs in existing code. Fixed the bugs. While fixing bugs, noted other issues, which are fixed in separate patches. Frank Rowand (17): of: overlay: add tests to validate kfrees from overlay removal of: overlay: add missing of_node_put() after add new node to changeset of: overlay: add missing of_node_get() in __of_attach_node_sysfs powerpc/pseries: add of_node_put() in dlpar_detach_node() of: overlay: use prop add changeset entry for property in new nodes of: overlay: do not duplicate properties from overlay for new nodes of: overlay: reorder fields in struct fragment of: overlay: validate overlay properties #address-cells and #size-cells of: overlay: make all pr_debug() and pr_err() messages unique of: overlay: test case of two fragments adding same node of: overlay: check prevents multiple fragments add or delete same node of: overlay: check prevents multiple fragments touching same property of: unittest: remove unused of_unittest_apply_overlay() argument of: overlay: set node fields from properties when add new overlay node of: unittest: allow base devicetree to have symbol metadata of: unittest: find overlays[] entry by name instead of index of: unittest: initialize args before calling of_*parse_*() arch/powerpc/platforms/pseries/dlpar.c | 2 + drivers/of/dynamic.c | 59 - drivers/of/kobj.c | 4 +- drivers/of/overlay.c | 292 - drivers/of/unittest-data/Makefile | 2 + .../of/unittest-data/overlay_bad_add_dup_node.dts | 28 ++ .../of/unittest-data/overlay_bad_add_dup_prop.dts | 24 ++ drivers/of/unittest-data/overlay_base.dts | 1 + drivers/of/unittest.c | 96 +-- include/linux/of.h | 21 +- 10 files changed, 432 insertions(+), 97 deletions(-) create mode 100644 drivers/of/unittest-data/overlay_bad_add_dup_node.dts create mode 100644 drivers/of/unittest-data/overlay_bad_add_dup_prop.dts
Re: [PATCH v6 00/18] of: overlay: validation checks, subsequent fixes
On 11/7/18 4:09 AM, Michael Ellerman wrote: > Frank Rowand writes: > >> Hi Michael, Ben, Paul, >> >> Do you know if anyone has tried this series on PowerPC? > > I have. No obvious breakage. > > My test does a loop of adding and removing multiple CPUs multiple times, > and in the past that has uncovered refcounting bugs. So I don't think > we're leaking any with this series applied. > > I used the tracepoint patch to keep an eye on the refcounts :) > > https://patchwork.ozlabs.org/patch/751602/ > > > I'm happy for this series to go into linux-next where it should get some > more testing. > > cheers > Thanks for the reviews and testing. -Frank
Re: [PATCH v6 04/18] powerpc/pseries: add of_node_put() in dlpar_detach_node()
On 11/7/18 4:23 AM, Michael Ellerman wrote: > frowand.l...@gmail.com writes: > >> From: Frank Rowand >> >> "of: overlay: add missing of_node_get() in __of_attach_node_sysfs" > > It would be clearer if you said 'The previous commit "of: overlay ..." Will fix. >> added a missing of_node_get() to __of_attach_node_sysfs(). This >> results in a refcount imbalance for nodes attached with >> dlpar_attach_node(). The calling sequence from dlpar_attach_node() >> to __of_attach_node_sysfs() is: >> >>dlpar_attach_node() >> of_attach_node() >> __of_attach_node_sysfs() >> >> Tested-by: Alan Tull >> Signed-off-by: Frank Rowand >> --- >> >> * UNTESTED. I need people with the affected PowerPC systems >> *(systems that dynamically allocate and deallocate >> *devicetree nodes) to test this patch. > > This looks OK to me in light of the previous patch. > > Acked-by: Michael Ellerman > > It also means dlpar_detach_node() is again behaving as described in the > comment to of_detach_node(). > > It would be good to make mention of: > > Fixes: 68baf692c435 ("powerpc/pseries: Fix of_node_put() underflow during > DLPAR remove") > > Which removed an of_node_put() in the exact same place for different > reasons. OK. -Frank > > cheers > >> diff --git a/arch/powerpc/platforms/pseries/dlpar.c >> b/arch/powerpc/platforms/pseries/dlpar.c >> index 7625546caefd..17958043e7f7 100644 >> --- a/arch/powerpc/platforms/pseries/dlpar.c >> +++ b/arch/powerpc/platforms/pseries/dlpar.c >> @@ -270,6 +270,8 @@ int dlpar_detach_node(struct device_node *dn) >> if (rc) >> return rc; >> >> +of_node_put(dn); >> + >> return 0; >> } >> >> -- >> Frank Rowand >
Re: [PATCH v6 03/18] of: overlay: add missing of_node_get() in __of_attach_node_sysfs
On 11/7/18 4:14 AM, Michael Ellerman wrote: > frowand.l...@gmail.com writes: > >> From: Frank Rowand >> >> There is a matching of_node_put() in __of_detach_node_sysfs() >> >> Remove misleading comment from function header comment for >> of_detach_node(). >> >> This patch may result in memory leaks from code that directly calls >> the dynamic node add and delete functions directly instead of >> using changesets. >> >> Tested-by: Alan Tull >> Signed-off-by: Frank Rowand > > This seems sensible to me. I guess we could argue about whether the > sysfs code needs its own reference, but it certainly doesn't hurt that > it does, as long as it's handled symmetrically - which it is now. > > Acked-by: Michael Ellerman (powerpc) > >> --- >> >> This patch should result in powerpc systems that dynamically >> allocate a node, then later deallocate the node to have a >> memory leak when the node is deallocated. >> >> The next patch in the series will fix the leak. > > I think this should go in the changelog, it's useful information that we > don't want to lose track of once this is applied. Will do. -Frank > > Either that or we actually squash the two patches together when applying > to avoid the bisection break. > > cheers > >> drivers/of/dynamic.c | 3 --- >> drivers/of/kobj.c| 4 +++- >> 2 files changed, 3 insertions(+), 4 deletions(-) >> >> diff --git a/drivers/of/dynamic.c b/drivers/of/dynamic.c >> index 12c3f9a15e94..146681540487 100644 >> --- a/drivers/of/dynamic.c >> +++ b/drivers/of/dynamic.c >> @@ -272,9 +272,6 @@ void __of_detach_node(struct device_node *np) >> >> /** >> * of_detach_node() - "Unplug" a node from the device tree. >> - * >> - * The caller must hold a reference to the node. The memory associated with >> - * the node is not freed until its refcount goes to zero. >> */ >> int of_detach_node(struct device_node *np) >> { >> diff --git a/drivers/of/kobj.c b/drivers/of/kobj.c >> index 7a0a18980b98..c72eef988041 100644 >> --- a/drivers/of/kobj.c >> +++ b/drivers/of/kobj.c >> @@ -133,6 +133,9 @@ int __of_attach_node_sysfs(struct device_node *np) >> } >> if (!name) >> return -ENOMEM; >> + >> +of_node_get(np); >> + >> rc = kobject_add(&np->kobj, parent, "%s", name); >> kfree(name); >> if (rc) >> @@ -159,6 +162,5 @@ void __of_detach_node_sysfs(struct device_node *np) >> kobject_del(&np->kobj); >> } >> >> -/* finally remove the kobj_init ref */ >> of_node_put(np); >> } >> -- >> Frank Rowand >
Re: [PATCH v6 07/18] of: dynamic: change type of of_{at,de}tach_node() to void
On 11/7/18 4:08 AM, Michael Ellerman wrote: > frowand.l...@gmail.com writes: > >> From: Frank Rowand >> >> of_attach_node() and of_detach_node() always return zero, so >> their return value is meaningless. > > But should they always return zero? > > At least __of_attach_node_sysfs() can fail in several ways. Sigh. And of_reconfig_notify() can fail. And at one point in the history the return value of of_reconfig_notify() was returned by of_attach_node() if of_reconfig_notify() failed. > And there's also this in __of_detach_node() which should probably be > returning an error: > > if (WARN_ON(of_node_check_flag(np, OF_DETACHED))) > return; > > > Seems to me we should instead be fixing these to propagate errors, > rather than hiding them? The history of how of_attach_node() stopped propagating errors is a bit more complex than I want to dig into at the moment. So I'll drop this patch from the series and add investigating this onto my todo list. I suspect that the result of investigating will be that error return values should not be ignored in of_attach_node() and of_detach_node(), but should instead be propagated to the callers, as you suggest. -Frank > > cheers >
Re: [PATCH v6 00/18] of: overlay: validation checks, subsequent fixes
Hi Michael, Ben, Paul, Do you know if anyone has tried this series on PowerPC? Thanks, -Frank On 11/5/18 11:24 AM, Rob Herring wrote: > On Mon, Nov 5, 2018 at 9:26 AM wrote: >> >> From: Frank Rowand >> >> Add checks to (1) overlay apply process and (2) memory freeing >> triggered by overlay release. The checks are intended to detect >> possible memory leaks and invalid overlays. >> >> The checks revealed bugs in existing code. Fixed the bugs. >> >> While fixing bugs, noted other issues, which are fixed in >> separate patches. >> >> * Powerpc folks: I was not able to test the patches that >> * directly impact Powerpc systems that use dynamic >> * devicetree. Please review that code carefully and >> * test. The specific patches are: 03/16, 04/16, 07/16 > > I'm waiting for this to happen. Send me a pull req when it does or > when you give up waiting for a response. > > Rob >
Re: [PATCH v5 00/18] of: overlay: validation checks, subsequent fixes
On 10/18/18 15:46, frowand.l...@gmail.com wrote: > From: Frank Rowand > > Add checks to (1) overlay apply process and (2) memory freeing > triggered by overlay release. The checks are intended to detect > possible memory leaks and invalid overlays. > > The checks revealed bugs in existing code. Fixed the bugs. > > While fixing bugs, noted other issues, which are fixed in > separate patches. git version of the series: git://git.kernel.org/pub/scm/linux/kernel/git/frowand/linux.git $ git checkout v4.19-rc1--kfree_validate--v5 $ git log --oneline v4.19-rc1.. 62e8f28bb14b of: unittest: initialize args before calling of_*parse_*() cc8b630f0c7f of: unittest: find overlays[] entry by name instead of index b80a8e974e0f of: unittest: allow base devicetree to have symbol metadata bbcd6ead8e36 of: overlay: set node fields from properties when add new overlay node e02d06f99646 of: unittest: remove unused of_unittest_apply_overlay() argument 484ba7f7eb4a of: overlay: check prevents multiple fragments touching same property 4640b81a605b of: overlay: check prevents multiple fragments add or delete same node 698f942ee230 of: overlay: test case of two fragments adding same node 5fe758e00f1f of: overlay: make all pr_debug() and pr_err() messages unique 868c6f70eed5 of: overlay: validate overlay properties #address-cells and #size-cells 06bc44ce477f of: overlay: reorder fields in struct fragment 584f4537377c of: dynamic: change type of of_{at,de}tach_node() to void 54f30ea3bf65 of: overlay: do not duplicate properties from overlay for new nodes ad4180c300fc of: overlay: use prop add changeset entry for property in new nodes b1bdca739700 powerpc/pseries: add of_node_put() in dlpar_detach_node() 8e0290d5cb62 of: overlay: add missing of_node_get() in __of_attach_node_sysfs 93e221495a9f of: overlay: add missing of_node_put() after add new node to changeset 86043f08e539 of: overlay: add tests to validate kfrees from overlay removal
Re: [PATCH v4 09/18] of: overlay: validate overlay properties #address-cells and #size-cells
On 10/18/18 11:13, Rob Herring wrote: > On Mon, Oct 15, 2018 at 07:37:29PM -0700, frowand.l...@gmail.com wrote: >> From: Frank Rowand >> >> If overlay properties #address-cells or #size-cells are already in >> the live devicetree for any given node, then the values in the >> overlay must match the values in the live tree. >> >> If the properties are already in the live tree then there is no >> need to create a changeset entry to add them since they must >> have the same value. This reduces the memory used by the >> changeset and eliminates a possible memory leak. This is >> verified by 12 fewer warnings during the devicetree unittest, >> as the possible memory leak warnings about #address-cells and > > Still missing the rest... > > And what about my other comments too? Apologies, paper bag. Will fix all. -Frank > >> >> Signed-off-by: Frank Rowand >> --- >> Changes since v3: >> - for errors of an overlay changing the value of #size-cells or >> #address-cells, return -EINVAL so that overlay apply will fail >> - for errors of an overlay changing the value of #size-cells or >> #address-cells, make the message more direct. >> Old message: >> OF: overlay: ERROR: overlay and/or live tree #size-cells invalid in >> node /soc/base_fpga_region >> New message: >> OF: overlay: ERROR: changing value of >> /soc/base_fpga_region/#size-cells not allowed >> >> drivers/of/overlay.c | 42 +++--- >> 1 file changed, 39 insertions(+), 3 deletions(-) >
Re: [PATCH v4 04/18] powerpc/pseries: add of_node_put() in dlpar_detach_node()
On 10/18/18 10:09, Rob Herring wrote: > On Mon, Oct 15, 2018 at 07:37:24PM -0700, frowand.l...@gmail.com wrote: >> From: Frank Rowand >> >> "of: overlay: add missing of_node_get() in __of_attach_node_sysfs" >> added a missing of_node_get() to __of_attach_node_sysfs(). This >> results in a refcount imbalance for nodes attached with >> dlpar_attach_node(). The calling sequence from dlpar_attach_node() >> to __of_attach_node_sysfs() is: >> >>dlpar_attach_node() >> of_attach_node() >> __of_attach_node_sysfs() > > IIRC, there's a long standing item in the todo (Grant's) to convert the > open coded dlpar code. Maybe you want to do that first? I'd like to avoid extra delays to getting the current (with necesary fixes) series accepted because the series is rather intrusive and could have conflicts with other patches. I'm also worried that I don't have access to any of the systems that use the dynamic overlay code, and I don't have any way to test the changes. Can we encourage the users of this code to convert the open coded dlpar code? >> >> Signed-off-by: Frank Rowand >> --- >> >> * UNTESTED. I need people with the affected PowerPC systems >> *(systems that dynamically allocate and deallocate >> *devicetree nodes) to test this patch. > > Can't we write a test case that does the same thing? I could write a simplistic test case, but I'm concerned that simplistic is not anywhere near sufficient. And my test case would reflect the same assumptions I already have when I wrote this patch. >> >> arch/powerpc/platforms/pseries/dlpar.c | 2 ++ >> 1 file changed, 2 insertions(+) >> >> diff --git a/arch/powerpc/platforms/pseries/dlpar.c >> b/arch/powerpc/platforms/pseries/dlpar.c >> index a0b20c03f078..e3010b14aea5 100644 >> --- a/arch/powerpc/platforms/pseries/dlpar.c >> +++ b/arch/powerpc/platforms/pseries/dlpar.c >> @@ -272,6 +272,8 @@ int dlpar_detach_node(struct device_node *dn) >> if (rc) >> return rc; >> >> +of_node_put(dn); >> + >> return 0; >> } >> >> -- >> Frank Rowand >> >
Re: [PATCH v4 02/18] of: overlay: add missing of_node_put() after add new node to changeset
On 10/18/18 10:05, Rob Herring wrote: > On Mon, Oct 15, 2018 at 07:37:22PM -0700, frowand.l...@gmail.com wrote: >> From: Frank Rowand >> >> The refcount of a newly added overlay node decrements to one >> (instead of zero) when the overlay changeset is destroyed. This >> change will cause the final decrement be to zero. >> >> After applying this patch, new validation warnings will be >> reported from the devicetree unittest during boot due to >> a pre-existing devicetree bug. The warnings will be similar to: >> >> OF: ERROR: memory leak of_node_release() overlay node >> /testcase-data/overlay-node/test-bus/test-unittest4 before free overlay >> changeset > > Same comment on formatting. OK. >> >> This pre-existing devicetree bug will also trigger a WARN_ONCE() from >> refcount_sub_and_test_checked() when an overlay changeset is >> destroyed without having first been applied. This scenario occurs >> when an error in the overlay is detected during the overlay changeset >> creation: >> >> WARNING: CPU: 0 PID: 1 at lib/refcount.c:187 >> refcount_sub_and_test_checked+0xa8/0xbc >> refcount_t: underflow; use-after-free. >> >> (unwind_backtrace) from (show_stack+0x10/0x14) >> (show_stack) from (dump_stack+0x6c/0x8c) >> (dump_stack) from (__warn+0xdc/0x104) >> (__warn) from (warn_slowpath_fmt+0x44/0x6c) >> (warn_slowpath_fmt) from (refcount_sub_and_test_checked+0xa8/0xbc) >> (refcount_sub_and_test_checked) from (kobject_put+0x24/0x208) >> (kobject_put) from (of_changeset_destroy+0x2c/0xb4) >> (of_changeset_destroy) from (free_overlay_changeset+0x1c/0x9c) >> (free_overlay_changeset) from (of_overlay_remove+0x284/0x2cc) >> (of_overlay_remove) from >> (of_unittest_apply_revert_overlay_check.constprop.4+0xf8/0x1e8) >> (of_unittest_apply_revert_overlay_check.constprop.4) from >> (of_unittest_overlay+0x960/0xed8) >> (of_unittest_overlay) from (of_unittest+0x1cc4/0x2138) >> (of_unittest) from (do_one_initcall+0x4c/0x28c) >> (do_one_initcall) from (kernel_init_freeable+0x29c/0x378) >> (kernel_init_freeable) from (kernel_init+0x8/0x110) >> (kernel_init) from (ret_from_fork+0x14/0x2c) >> >> Signed-off-by: Frank Rowand >> --- >> drivers/of/overlay.c | 4 +++- >> 1 file changed, 3 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/of/overlay.c b/drivers/of/overlay.c >> index 1176cb4b6e4e..32cfee68f2e3 100644 >> --- a/drivers/of/overlay.c >> +++ b/drivers/of/overlay.c >> @@ -379,7 +379,9 @@ static int add_changeset_node(struct overlay_changeset >> *ovcs, >> if (ret) >> return ret; >> >> -return build_changeset_next_level(ovcs, tchild, node); >> +ret = build_changeset_next_level(ovcs, tchild, node); >> +of_node_put(tchild); >> +return ret; >> } >> >> if (node->phandle && tchild->phandle) >> -- >> Frank Rowand >> >
Re: [PATCH v4 01/18] of: overlay: add tests to validate kfrees from overlay removal
On 10/18/18 10:03, Rob Herring wrote: > On Mon, Oct 15, 2018 at 07:37:21PM -0700, frowand.l...@gmail.com wrote: >> From: Frank Rowand >> >> Add checks: >> - attempted kfree due to refcount reaching zero before overlay >> is removed >> - properties linked to an overlay node when the node is removed >> - node refcount > one during node removal in a changeset destroy, >> if the node was created by the changeset >> >> After applying this patch, several validation warnings will be >> reported from the devicetree unittest during boot due to >> pre-existing devicetree bugs. The warnings will be similar to: >> >> OF: ERROR: of_node_release() overlay node >> /testcase-data/overlay-node/test-bus/test-unittest11/test-unittest111 >> contains unexpected properties >> OF: ERROR: memory leak - destroy cset entry: attach overlay node >> /testcase-data-2/substation@100/hvac-medium-2 expected refcount 1 instead of >> 2. of_node_get() / of_node_put() are unbalanced for this node. > > These messages could be formatted more consistently. Put the path either > at the beginning (after any prefix) or end. Beginning is more like a > compiler error. End puts what the problem is before it's off the edge of > the screen. The inconsistency makes the word flow more natural, but I agree that consistency is more important. I think I can make all the messages say the problem first, then provide the path at the end. >> Signed-off-by: Frank Rowand >> --- >> Changes since v3: >> - Add expected value of refcount for destroy cset entry error. Also >> explain the cause of the error. >> >> drivers/of/dynamic.c | 29 + >> drivers/of/overlay.c | 1 + >> include/linux/of.h | 15 ++- >> 3 files changed, 40 insertions(+), 5 deletions(-) >> >> diff --git a/drivers/of/dynamic.c b/drivers/of/dynamic.c >> index f4f8ed9b5454..24c97b7a050f 100644 >> --- a/drivers/of/dynamic.c >> +++ b/drivers/of/dynamic.c >> @@ -330,6 +330,25 @@ void of_node_release(struct kobject *kobj) >> if (!of_node_check_flag(node, OF_DYNAMIC)) >> return; >> >> +if (of_node_check_flag(node, OF_OVERLAY)) { >> + >> +if (!of_node_check_flag(node, OF_OVERLAY_FREE_CSET)) { > > I worry the flags are getting unwieldy. I considered that. I think we are still ok, and I don't have a better solution than adding flag values. (I did have some Rube Goldberg variations.) > >> +/* premature refcount of zero, do not free memory */ >> +pr_err("ERROR: memory leak %s() overlay node %pOF >> before free overlay changeset\n", >> + __func__, node); >> +return; >> +} >> + >> +/* >> + * If node->properties non-empty then properties were added >> + * to this node either by different overlay that has not >> + * yet been removed, or by a non-overlay mechanism. >> + */ >> +if (node->properties) >> +pr_err("ERROR: %s() overlay node %pOF contains >> unexpected properties\n", >> + __func__, node); >> +} >> + >> property_list_free(node->properties); >> property_list_free(node->deadprops); >> >> @@ -434,6 +453,16 @@ struct device_node *__of_node_dup(const struct >> device_node *np, >> >> static void __of_changeset_entry_destroy(struct of_changeset_entry *ce) >> { >> +if (ce->action == OF_RECONFIG_ATTACH_NODE && >> +of_node_check_flag(ce->np, OF_OVERLAY)) { >> +if (kref_read(&ce->np->kobj.kref) > 1) { >> +pr_err("ERROR: memory leak - destroy cset entry: attach >> overlay node %pOF expected refcount 1 instead of %d. of_node_get() / >> of_node_put() are unbalanced for this node.\n", >> + ce->np, kref_read(&ce->np->kobj.kref)); >> +} else { >> +of_node_set_flag(ce->np, OF_OVERLAY_FREE_CSET); >> +} >> +} >> + >> of_node_put(ce->np); >> list_del(&ce->node); >> kfree(ce); >> diff --git a/drivers/of/overlay.c b/drivers/of/overlay.c >> index eda57ef12fd0..1176cb4b6e4e 100644 >> --- a/drivers/of/overlay.c >> +++ b/drivers/of/overlay.c >> @@ -373,6 +373,7 @@ static int add_changeset_node
Re: [PATCH v4 00/18] of: overlay: validation checks, subsequent fixes
On 10/16/18 02:47, Michael Ellerman wrote: > frowand.l...@gmail.com writes: > >> From: Frank Rowand >> >> Add checks to (1) overlay apply process and (2) memory freeing >> triggered by overlay release. The checks are intended to detect >> possible memory leaks and invalid overlays. >> >> The checks revealed bugs in existing code. Fixed the bugs. >> >> While fixing bugs, noted other issues, which are fixed in >> separate patches. >> >> * Powerpc folks: I was not able to test the patches that >> * directly impact Powerpc systems that use dynamic >> * devicetree. Please review that code carefully and >> * test. The specific patches are: 03/16, 04/16, 07/16 > > Hi Frank, > > Do you have this series in a git tree somewhere? > > I tried applying it on top of linux-next but hit some conflicts which I > couldn't easily resolve. > > cheers > git://git.kernel.org/pub/scm/linux/kernel/git/frowand/linux.git $ git checkout v4.19-rc1--kfree_validate--v4 $ git log --oneline v4.19-rc1.. 2ba1b7d353dd of: unittest: initialize args before calling of_*parse_*() 4f9108209f79 of: unittest: find overlays[] entry by name instead of index 353403c76ff8 of: unittest: allow base devicetree to have symbol metadata 8fc37e04a01b of: overlay: set node fields from properties when add new overlay n 05d5df0e5151 of: unittest: remove unused of_unittest_apply_overlay() argument 8c021cba757a of: overlay: check prevents multiple fragments touching same proper 797a6f66e039 of: overlay: check prevents multiple fragments add or delete same n c385e25a040d of: overlay: test case of two fragments adding same node c88fd240f0e0 of: overlay: make all pr_debug() and pr_err() messages unique 1028a215d32a of: overlay: validate overlay properties #address-cells and #size-c f1a97ef74ce4 of: overlay: reorder fields in struct fragment ffe78cf7a1fb of: dynamic: change type of of_{at,de}tach_node() to void 5f5ff8ec0c0c of: overlay: do not duplicate properties from overlay for new nodes 06e72dcb2bb0 of: overlay: use prop add changeset entry for property in new nodes a02f8d326a08 powerpc/pseries: add of_node_put() in dlpar_detach_node() e203be664330 of: overlay: add missing of_node_get() in __of_attach_node_sysfs 8eb46208e7c8 of: overlay: add missing of_node_put() after add new node to change b22067db7cf9 of: overlay: add tests to validate kfrees from overlay removal
Re: [PATCH v3 00/18] of: overlay: validation checks, subsequent fixes
On 10/15/18 12:21, Alan Tull wrote: > On Sun, Oct 14, 2018 at 7:26 PM wrote: >> >> From: Frank Rowand >> >> Add checks to (1) overlay apply process and (2) memory freeing >> triggered by overlay release. The checks are intended to detect >> possible memory leaks and invalid overlays. >> >> The checks revealed bugs in existing code. Fixed the bugs. >> >> While fixing bugs, noted other issues, which are fixed in >> separate patches. >> >> * Powerpc folks: I was not able to test the patches that >> * directly impact Powerpc systems that use dynamic >> * devicetree. Please review that code carefully and >> * test. The specific patches are: 03/16, 04/16, 07/16 >> >> FPGA folks: >> >> I made the validation checks that should result in an >> invalid live devicetree report "ERROR" and cause the overlay apply >> to fail. >> >> I made the memory leak validation tests report "WARNING" and allow >> the overlay apply to complete successfully. Please let me know >> if you encounter the warnings. There are at least two paths >> forward to deal with the cases that trigger the warning: (1) change >> the warning to an error and fail the overlay apply, or (2) find a >> way to detect the potential memory leaks and free the memory >> appropriately. > > I reran my FPGA testing. The strings are fixed, no longer NULL. I Thanks for the further testing! > have functionality back, my test passes now. I'm seeing the intended > warnings about any properties added to existing nodes. That includes > warnings about added symbols. Below is a simplified part to show some > of what I'm seeing. > > By the way my testing is all using Pantelis' DT overlay configfs interface. > > root@arria10:~# ./apply-static-region.sh > > Applying dtbo: socfpga_arria10_socdk_sdmmc_ghrd_ovl_ext_cfg.dtb > > [ 1821.088640] OF: overlay: WARNING: add_changeset_property(), memory > leak will occur if overlay removed. Property: > /soc/base_fpga_region/ranges > [ 1821.103307] OF: overlay: WARNING: add_changeset_property(), memory > leak will occur if overlay removed. Property: > /soc/base_fpga_region/external-fpga-config > [ 1821.117359] OF: overlay: WARNING: add_changeset_property(), memory > leak will occur if overlay removed. Property: > /soc/base_fpga_region/clocks > [ 1821.130130] OF: overlay: WARNING: add_changeset_property(), memory > leak will occur if overlay removed. Property: > /soc/base_fpga_region/clock-names > [ 1821.143449] OF: overlay: WARNING: add_changeset_property(), memory > leak will occur if overlay removed. Property: /__symbols__/clk_0 > [ 1821.155357] OF: overlay: WARNING: add_changeset_property(), memory > leak will occur if overlay removed. Property: /__symbols__/ILC > [ 1821.167074] OF: overlay: WARNING: add_changeset_property(), memory > leak will occur if overlay removed. Property: > /__symbols__/freeze_controller_0 > [ 1821.180171] OF: overlay: WARNING: add_changeset_property(), memory > leak will occur if overlay removed. Property: > /__symbols__/sysid_qsys_0 > [ 1821.192662] OF: overlay: WARNING: add_changeset_property(), memory > leak will occur if overlay removed. Property: /__symbols__/led_pio > [ 1821.204720] OF: overlay: WARNING: add_changeset_property(), memory > leak will occur if overlay removed. Property: /__symbols__/button_pio > [ 1821.217034] OF: overlay: WARNING: add_changeset_property(), memory > leak will occur if overlay removed. Property: /__symbols__/dipsw_pio > [ 1821.231977] of-fpga-region soc:base_fpga_region:fpga_pr_region0: > FPGA Region probed > [ 1821.240144] altera_freeze_br ff200450.freeze_controller: fpga > bridge [freeze] registered > > root@arria10:~# ./apply-static-region.sh root@arria10:~# rmdir > /sys/kernel/config/device-tree/overlays/1-socfpga_arria10_socdk_sdmmc_ghrd_ovl_ext_cfg.dtb > > [ 1823.805564] OF: ERROR: memory leak - destroy cset entry: attach > overlay node /soc/base_fpga_region/clk_0 with refcount 2 That is indicating that an unbalanced of_node_get() / of_node_put() exists for that node. I'll have to update that message to be more explicit about that. -Frank > > Alan > > > Alan > > > >> >> ALL people: >> >> The validations do _not_ address another major concern I have with >> releasing overlays, which is use after free errors. >> >> Changes since v2: >> >> - 13/18: Use continue to reduce indentation in find_dup_cset_node_entry() >> and find_dup_cset_prop() >> >> Changes since v1: >> >> - move patch 16/16 to 17/18 >
Re: [PATCH v3 09/18] of: overlay: validate overlay properties #address-cells and #size-cells
On 10/15/18 12:01, Alan Tull wrote: > On Sun, Oct 14, 2018 at 7:26 PM wrote: >> >> From: Frank Rowand >> >> If overlay properties #address-cells or #size-cells are already in >> the live devicetree for any given node, then the values in the >> overlay must match the values in the live tree. >> >> If the properties are already in the live tree then there is no >> need to create a changeset entry to add them since they must >> have the same value. This reduces the memory used by the >> changeset and eliminates a possible memory leak. This is >> verified by 12 fewer warnings during the devicetree unittest, >> as the possible memory leak warnings about #address-cells and >> >> Signed-off-by: Frank Rowand >> --- >> drivers/of/overlay.c | 38 +++--- >> 1 file changed, 35 insertions(+), 3 deletions(-) >> >> diff --git a/drivers/of/overlay.c b/drivers/of/overlay.c >> index 272a0d1a5e18..ee66651db553 100644 >> --- a/drivers/of/overlay.c >> +++ b/drivers/of/overlay.c >> @@ -287,7 +287,12 @@ static struct property *dup_and_fixup_symbol_prop( >> * @target may be either in the live devicetree or in a new subtree that >> * is contained in the changeset. >> * >> - * Some special properties are not updated (no error returned). >> + * Some special properties are not added or updated (no error returned): >> + * "name", "phandle", "linux,phandle". >> + * >> + * Properties "#address-cells" and "#size-cells" are not updated if they >> + * are already in the live tree, but if present in the live tree, the values >> + * in the overlay must match the values in the live tree. >> * >> * Update of property in symbols node is not allowed. >> * >> @@ -300,6 +305,7 @@ static int add_changeset_property(struct >> overlay_changeset *ovcs, >> { >> struct property *new_prop = NULL, *prop; >> int ret = 0; >> + bool check_for_non_overlay_node = false; >> >> if (!of_prop_cmp(overlay_prop->name, "name") || >> !of_prop_cmp(overlay_prop->name, "phandle") || >> @@ -322,13 +328,39 @@ static int add_changeset_property(struct >> overlay_changeset *ovcs, >> if (!new_prop) >> return -ENOMEM; >> >> - if (!prop) >> + if (!prop) { >> + >> + check_for_non_overlay_node = true; >> ret = of_changeset_add_property(&ovcs->cset, target->np, >> new_prop); >> - else >> + >> + } else if (!of_prop_cmp(prop->name, "#address-cells")) { >> + > > Hi Frank, > > If we get these ERROR messages, I suggest that this function should > return an error so the overlay will be rejected. > >> + if (prop->length != 4 || new_prop->length != 4 || >> + *(u32 *)prop->value != *(u32 *)new_prop->value) > > *(u32 *)prop->value != *(u32 *)new_prop->value) { > >> + pr_err("ERROR: overlay and/or live tree >> #address-cells invalid in node %pOF\n", >> + target->np); > >ret = -EINVAL; > } > > Otherwise there is an ERROR message, but it continues trying to apply > the invalid overlay anyway and I get an oops. By adding the ret = > -EINVAL, the overlay gets rejected and the oops is avoided. Yes, that sounds good. >> + >> + } else if (!of_prop_cmp(prop->name, "#size-cells")) { >> + >> + if (prop->length != 4 || new_prop->length != 4 || >> + *(u32 *)prop->value != *(u32 *)new_prop->value) >> + pr_err("ERROR: overlay and/or live tree #size-cells >> invalid in node %pOF\n", >> + target->np); > > Add the ret = -EINVAL here also. This give me the following (if my > overlay changes #address-cells): Yes. > [ 21.167551] OF: overlay: ERROR: overlay and/or live tree > #address-cells invalid in node /soc/base_fpga_region > [ 21.177442] OF: overlay: add_changeset_property ret=-22 > [ 21.182656] create_overlay: Failed to create overlay (err=-22) > > Also, I wonder if the ERROR message could be more direct. Currently > it says the #address-cells property is invalid but that doesn't say > anything about why it's invalid. How about somethin
Re: [PATCH v3 13/18] of: overlay: check prevents multiple fragments touching same property
On 10/14/18 20:21, Frank Rowand wrote: > On 10/14/18 18:55, Joe Perches wrote: >> On Sun, 2018-10-14 at 18:52 -0700, Frank Rowand wrote: >>> On 10/14/18 18:06, Joe Perches wrote: >>>> On Sun, 2018-10-14 at 17:24 -0700, frowand.l...@gmail.com wrote: >>>>> From: Frank Rowand >>>>> >>>>> Add test case of two fragments updating the same property. After >>>>> adding the test case, the system hangs at end of boot, after >>>>> after slub stack dumps from kfree() in crypto modprobe code. >> [] >>>> I think this is worse performance than before. >>>> >>>> This now walks all entries when before it would >>>> return -EINVAL directly when it found a match. >>> >>> Yes, it is worse performance, but that is OK. >>> >>> This is a check that is done when a devicetree overlay is applied. >>> If an error occurs then that means that the overlay was incorrectly >>> specified. The file drivers/of/unittest-data/overlay_bad_add_dup_prop.dts >>> in this patch provides an example of how a bad overlay can be created. >>> >>> Once an error was detected, the check could return immediately, or it >>> could continue to give a complete list of detected errors. I chose to >>> give the complete list of detected errors. >> >> Swell. Please describe that in the commit message. > > If a version 4 of the series is created I will update the commit > message. As a stand alone item I do not think it is worth a > new version. And there will be a version 4, so I will update the commit message. -Frank
Re: [PATCH v3 13/18] of: overlay: check prevents multiple fragments touching same property
On 10/14/18 18:55, Joe Perches wrote: > On Sun, 2018-10-14 at 18:52 -0700, Frank Rowand wrote: >> On 10/14/18 18:06, Joe Perches wrote: >>> On Sun, 2018-10-14 at 17:24 -0700, frowand.l...@gmail.com wrote: >>>> From: Frank Rowand >>>> >>>> Add test case of two fragments updating the same property. After >>>> adding the test case, the system hangs at end of boot, after >>>> after slub stack dumps from kfree() in crypto modprobe code. > [] >>> I think this is worse performance than before. >>> >>> This now walks all entries when before it would >>> return -EINVAL directly when it found a match. >> >> Yes, it is worse performance, but that is OK. >> >> This is a check that is done when a devicetree overlay is applied. >> If an error occurs then that means that the overlay was incorrectly >> specified. The file drivers/of/unittest-data/overlay_bad_add_dup_prop.dts >> in this patch provides an example of how a bad overlay can be created. >> >> Once an error was detected, the check could return immediately, or it >> could continue to give a complete list of detected errors. I chose to >> give the complete list of detected errors. > > Swell. Please describe that in the commit message. If a version 4 of the series is created I will update the commit message. As a stand alone item I do not think it is worth a new version. -Frank
Re: [PATCH v3 13/18] of: overlay: check prevents multiple fragments touching same property
On 10/14/18 18:06, Joe Perches wrote: > On Sun, 2018-10-14 at 17:24 -0700, frowand.l...@gmail.com wrote: >> From: Frank Rowand >> >> Add test case of two fragments updating the same property. After >> adding the test case, the system hangs at end of boot, after >> after slub stack dumps from kfree() in crypto modprobe code. > [] >> -static int check_changeset_dup_add_node(struct overlay_changeset *ovcs) >> +static int changeset_dup_entry_check(struct overlay_changeset *ovcs) >> { >> -struct of_changeset_entry *ce_1, *ce_2; >> -char *fn_1, *fn_2; >> -int name_match; >> +struct of_changeset_entry *ce_1; >> +int dup_entry = 0; >> >> list_for_each_entry(ce_1, &ovcs->cset.entries, node) { >> - >> -if (ce_1->action == OF_RECONFIG_ATTACH_NODE || >> -ce_1->action == OF_RECONFIG_DETACH_NODE) { >> - >> -ce_2 = ce_1; >> -list_for_each_entry_continue(ce_2, &ovcs->cset.entries, >> node) { >> -if (ce_2->action == OF_RECONFIG_ATTACH_NODE || >> -ce_2->action == OF_RECONFIG_DETACH_NODE) { >> -/* inexpensive name compare */ >> -if (!of_node_cmp(ce_1->np->full_name, >> -ce_2->np->full_name)) { >> -/* expensive full path name >> compare */ >> -fn_1 = kasprintf(GFP_KERNEL, >> "%pOF", ce_1->np); >> -fn_2 = kasprintf(GFP_KERNEL, >> "%pOF", ce_2->np); >> -name_match = !strcmp(fn_1, >> fn_2); >> -kfree(fn_1); >> -kfree(fn_2); >> -if (name_match) { >> -pr_err("ERROR: multiple >> overlay fragments add and/or delete node %pOF\n", >> - ce_1->np); >> -return -EINVAL; >> -} >> -} >> -} >> -} >> -} >> +dup_entry |= find_dup_cset_node_entry(ovcs, ce_1); >> +dup_entry |= find_dup_cset_prop(ovcs, ce_1); > > I think this is worse performance than before. > > This now walks all entries when before it would > return -EINVAL directly when it found a match. Yes, it is worse performance, but that is OK. This is a check that is done when a devicetree overlay is applied. If an error occurs then that means that the overlay was incorrectly specified. The file drivers/of/unittest-data/overlay_bad_add_dup_prop.dts in this patch provides an example of how a bad overlay can be created. Once an error was detected, the check could return immediately, or it could continue to give a complete list of detected errors. I chose to give the complete list of detected errors. -Frank
Re: [PATCH v2 12/18] of: overlay: check prevents multiple fragments add or delete same node
On 10/13/18 11:21, Frank Rowand wrote: > On 10/13/18 05:51, Joe Perches wrote: >> On Fri, 2018-10-12 at 21:53 -0700, frowand.l...@gmail.com wrote: >>> From: Frank Rowand >>> >>> Multiple overlay fragments adding or deleting the same node is not >>> supported. Replace code comment of such, with check to detect the >>> attempt and fail the overlay apply. >>> >>> Devicetree unittest where multiple fragments added the same node was >>> added in the previous patch in the series. After applying this patch >>> the unittest messages will no longer include: >>> >>>Duplicate name in motor-1, renamed to "controller#1" >>>OF: overlay: of_overlay_apply() err=0 >>>### dt-test ### of_overlay_fdt_apply() expected -22, ret=0, >>> overlay_bad_add_dup_node >>>### dt-test ### FAIL of_unittest_overlay_high_level():2419 Adding >>> overlay 'overlay_bad_add_dup_node' failed >>> >>>... >>> >>>### dt-test ### end of unittest - 210 passed, 1 failed >>> >>> but will instead include: >>> >>>OF: overlay: ERROR: multiple overlay fragments add and/or delete node >>> /testcase-data-2/substation@100/motor-1/controller >>> >>>... >>> >>>### dt-test ### end of unittest - 211 passed, 0 failed >> [] >>> diff --git a/drivers/of/overlay.c b/drivers/of/overlay.c >> [] >>> @@ -523,6 +515,54 @@ static int build_changeset_symbols_node(struct >>> overlay_changeset *ovcs, >>> } >>> >>> /** >>> + * check_changeset_dup_add_node() - changeset validation: duplicate add >>> node >>> + * @ovcs: Overlay changeset >>> + * >>> + * Check changeset @ovcs->cset for multiple add node entries for the same >>> + * node. >>> + * >>> + * Returns 0 on success, -ENOMEM if memory allocation failure, or -EINVAL >>> if >>> + * invalid overlay in @ovcs->fragments[]. >>> + */ >>> +static int check_changeset_dup_add_node(struct overlay_changeset *ovcs) >>> +{ >>> + struct of_changeset_entry *ce_1, *ce_2; >>> + char *fn_1, *fn_2; >>> + int name_match; >>> + >>> + list_for_each_entry(ce_1, &ovcs->cset.entries, node) { >>> + >>> + if (ce_1->action == OF_RECONFIG_ATTACH_NODE || >>> + ce_1->action == OF_RECONFIG_DETACH_NODE) { >>> + >>> + ce_2 = ce_1; >>> + list_for_each_entry_continue(ce_2, &ovcs->cset.entries, >>> node) { >>> + if (ce_2->action == OF_RECONFIG_ATTACH_NODE || >>> + ce_2->action == OF_RECONFIG_DETACH_NODE) { >>> + /* inexpensive name compare */ >>> + if (!of_node_cmp(ce_1->np->full_name, >>> + ce_2->np->full_name)) { >> >> A bit of odd indentation here. >> This line is normally aligned to the second ( on the line above. > > Yes, thanks. This line gets joined into a single line in version 3, so I will leave the bad formatting in patch 12 to make my life easier when moving to version 3. >> >>> + /* expensive full path name >>> compare */ >>> + fn_1 = kasprintf(GFP_KERNEL, >>> "%pOF", ce_1->np); >>> + fn_2 = kasprintf(GFP_KERNEL, >>> "%pOF", ce_2->np); >>> + name_match = !strcmp(fn_1, >>> fn_2); >>> + kfree(fn_1); >>> + kfree(fn_2); >>> + if (name_match) { >>> + pr_err("ERROR: multiple >>> overlay fragments add and/or delete node %pOF\n", >>> + ce_1->np); >>> + return -EINVAL; >>> + } >>> + } >>> + } >>> + } >>> + } >>> + } >>> + >>> + return 0; >
Re: [PATCH v2 12/18] of: overlay: check prevents multiple fragments add or delete same node
On 10/13/18 05:51, Joe Perches wrote: > On Fri, 2018-10-12 at 21:53 -0700, frowand.l...@gmail.com wrote: >> From: Frank Rowand >> >> Multiple overlay fragments adding or deleting the same node is not >> supported. Replace code comment of such, with check to detect the >> attempt and fail the overlay apply. >> >> Devicetree unittest where multiple fragments added the same node was >> added in the previous patch in the series. After applying this patch >> the unittest messages will no longer include: >> >>Duplicate name in motor-1, renamed to "controller#1" >>OF: overlay: of_overlay_apply() err=0 >>### dt-test ### of_overlay_fdt_apply() expected -22, ret=0, >> overlay_bad_add_dup_node >>### dt-test ### FAIL of_unittest_overlay_high_level():2419 Adding overlay >> 'overlay_bad_add_dup_node' failed >> >>... >> >>### dt-test ### end of unittest - 210 passed, 1 failed >> >> but will instead include: >> >>OF: overlay: ERROR: multiple overlay fragments add and/or delete node >> /testcase-data-2/substation@100/motor-1/controller >> >>... >> >>### dt-test ### end of unittest - 211 passed, 0 failed > [] >> diff --git a/drivers/of/overlay.c b/drivers/of/overlay.c > [] >> @@ -523,6 +515,54 @@ static int build_changeset_symbols_node(struct >> overlay_changeset *ovcs, >> } >> >> /** >> + * check_changeset_dup_add_node() - changeset validation: duplicate add node >> + * @ovcs: Overlay changeset >> + * >> + * Check changeset @ovcs->cset for multiple add node entries for the same >> + * node. >> + * >> + * Returns 0 on success, -ENOMEM if memory allocation failure, or -EINVAL if >> + * invalid overlay in @ovcs->fragments[]. >> + */ >> +static int check_changeset_dup_add_node(struct overlay_changeset *ovcs) >> +{ >> +struct of_changeset_entry *ce_1, *ce_2; >> +char *fn_1, *fn_2; >> +int name_match; >> + >> +list_for_each_entry(ce_1, &ovcs->cset.entries, node) { >> + >> +if (ce_1->action == OF_RECONFIG_ATTACH_NODE || >> +ce_1->action == OF_RECONFIG_DETACH_NODE) { >> + >> +ce_2 = ce_1; >> +list_for_each_entry_continue(ce_2, &ovcs->cset.entries, >> node) { >> +if (ce_2->action == OF_RECONFIG_ATTACH_NODE || >> +ce_2->action == OF_RECONFIG_DETACH_NODE) { >> +/* inexpensive name compare */ >> +if (!of_node_cmp(ce_1->np->full_name, >> +ce_2->np->full_name)) { > > A bit of odd indentation here. > This line is normally aligned to the second ( on the line above. Yes, thanks. > >> +/* expensive full path name >> compare */ >> +fn_1 = kasprintf(GFP_KERNEL, >> "%pOF", ce_1->np); >> +fn_2 = kasprintf(GFP_KERNEL, >> "%pOF", ce_2->np); >> +name_match = !strcmp(fn_1, >> fn_2); >> +kfree(fn_1); >> +kfree(fn_2); >> +if (name_match) { >> +pr_err("ERROR: multiple >> overlay fragments add and/or delete node %pOF\n", >> + ce_1->np); >> +return -EINVAL; >> +} >> +} >> +} >> +} >> +} >> +} >> + >> +return 0; >> +} > > Style trivia: > > Using inverted tests and continue would reduce indentation. Yes, thanks. -Frank > > list_for_each_entry(ce_1, &ovcs->cset.entries, node) { > if (ce_1->action != OF_RECONFIG_ATTACH_NODE && > ce_1->action != OF_RECONFIG_DETACH_NODE) > continue; > > ce_2 = ce_1; > list_for_each_entry_continue(ce_2, &ovcs->cset.entries, node) { > if (ce_2->action != OF_RECONFIG_ATTACH_NODE && &g
Re: [PATCH 05.1/16] of:overlay: missing name, phandle, linux, phandle in new nodes
On 10/11/18 12:33, Alan Tull wrote: > On Thu, Oct 11, 2018 at 12:39 AM Frank Rowand wrote: > > [resend of my messed up rejected email of a minute ago, sorry] > >> >> On 10/10/18 14:03, Frank Rowand wrote: < snip > > I understand you're quite busy with all this, but I'm wondering > whether it might be worth it go ahead and make the properties be > kernel objects also at this point. That would be an improvement for > the case of overlay properties added to non-overlay nodes, so the > lifespan of the overlay property memory can be coupled with the > properties kobj's instead of the node kobj's. > > Alan > That is one of the approaches that I am thinking about to handle the potential memory leaks from those properties. I'd like to make these changes in a step wise fashion, to let each major change get some exposure and use before moving on to the next step. Making properties into kernel objects would impact a lot of code. So not in this series. But thanks for thinking about it. -Frank
Re: [RFC PATCH v2 3/3] cpuidle/powernv: save-restore sprs in opal
+ devicetree mail list On 10/11/18 06:22, Akshay Adiga wrote: > From: Abhishek Goel > > This patch moves the saving and restoring of sprs for P9 cpuidle > from kernel to opal. > In an attempt to make the powernv idle code backward compatible, > and to some extent forward compatible, add support for pre-stop entry > and post-stop exit actions in OPAL. If a kernel knows about this > opal call, then just a firmware supporting newer hardware is required, > instead of waiting for kernel updates. > > Signed-off-by: Abhishek Goel > Signed-off-by: Akshay Adiga > --- > Changes from v1 : > - Code is rebased on Nick Piggin's v4 patch "powerpc/64s: reimplement book3s >idle code in C" > - Set a global variable "request_opal_call" to indicate that deep >states should make opal_call. > - All the states that loses hypervisor states will be handled by OPAL > - All the decision making such as identifying first thread in >the core and taking locks before restoring in such cases have also been >moved to OPAL > arch/powerpc/include/asm/opal-api.h | 4 +- > arch/powerpc/include/asm/opal.h | 3 + > arch/powerpc/include/asm/processor.h | 3 +- > arch/powerpc/kernel/idle_book3s.S | 6 +- > arch/powerpc/platforms/powernv/idle.c | 88 +-- > .../powerpc/platforms/powernv/opal-wrappers.S | 2 + > 6 files changed, 77 insertions(+), 29 deletions(-) > > diff --git a/arch/powerpc/include/asm/opal-api.h > b/arch/powerpc/include/asm/opal-api.h > index 8365353330b4..93ea1f79e295 100644 > --- a/arch/powerpc/include/asm/opal-api.h > +++ b/arch/powerpc/include/asm/opal-api.h > @@ -210,7 +210,9 @@ > #define OPAL_PCI_GET_PBCQ_TUNNEL_BAR 164 > #define OPAL_PCI_SET_PBCQ_TUNNEL_BAR 165 > #define OPAL_NX_COPROC_INIT 167 > -#define OPAL_LAST167 > +#define OPAL_IDLE_SAVE 170 > +#define OPAL_IDLE_RESTORE171 > +#define OPAL_LAST171 > > #define QUIESCE_HOLD 1 /* Spin all calls at entry */ > #define QUIESCE_REJECT 2 /* Fail all calls with > OPAL_BUSY */ > diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h > index ff3866473afe..26995e16171e 100644 > --- a/arch/powerpc/include/asm/opal.h > +++ b/arch/powerpc/include/asm/opal.h > @@ -356,6 +356,9 @@ extern int opal_handle_hmi_exception(struct pt_regs > *regs); > extern void opal_shutdown(void); > extern int opal_resync_timebase(void); > > +extern int opal_cpuidle_save(u64 psscr); > +extern int opal_cpuidle_restore(u64 psscr, u64 srr1); > + > extern void opal_lpc_init(void); > > extern void opal_kmsg_init(void); > diff --git a/arch/powerpc/include/asm/processor.h > b/arch/powerpc/include/asm/processor.h > index 822d3236ad7f..26fa6c1836f4 100644 > --- a/arch/powerpc/include/asm/processor.h > +++ b/arch/powerpc/include/asm/processor.h > @@ -510,7 +510,8 @@ static inline unsigned long get_clean_sp(unsigned long > sp, int is_32) > > /* asm stubs */ > extern unsigned long isa300_idle_stop_noloss(unsigned long psscr_val); > -extern unsigned long isa300_idle_stop_mayloss(unsigned long psscr_val); > +extern unsigned long isa300_idle_stop_mayloss(unsigned long psscr_val, > + bool request_opal_call); > extern unsigned long isa206_idle_insn_mayloss(unsigned long type); > > extern unsigned long cpuidle_disable; > diff --git a/arch/powerpc/kernel/idle_book3s.S > b/arch/powerpc/kernel/idle_book3s.S > index ffdee1ab4388..a2014d152035 100644 > --- a/arch/powerpc/kernel/idle_book3s.S > +++ b/arch/powerpc/kernel/idle_book3s.S > @@ -52,14 +52,16 @@ _GLOBAL(isa300_idle_stop_noloss) > _GLOBAL(isa300_idle_stop_mayloss) > mtspr SPRN_PSSCR,r3 > std r1,PACAR1(r13) > - mflrr4 > + mflrr7 > mfcrr5 > /* use stack red zone rather than a new frame */ > addir6,r1,-INT_FRAME_SIZE > SAVE_GPR(2, r6) > SAVE_NVGPRS(r6) > - std r4,_LINK(r6) > + std r7,_LINK(r6) > std r5,_CCR(r6) > + cmpwi r4,0 > + bne opal_cpuidle_save > PPC_STOP > b . /* catch bugs */ > > diff --git a/arch/powerpc/platforms/powernv/idle.c > b/arch/powerpc/platforms/powernv/idle.c > index 681a23a066bb..bcfe08022e65 100644 > --- a/arch/powerpc/platforms/powernv/idle.c > +++ b/arch/powerpc/platforms/powernv/idle.c > @@ -171,6 +171,7 @@ static void pnv_fastsleep_workaround_apply(void *info) > > static bool power7_fastsleep_workaround_entry = true; > static bool power7_fastsleep_workaround_exit = true; > +static bool request_opal_call = false; > > /* > * Used to store fastsleep workaround state > @@ -604,6 +605,7 @@ static unsigned long power9_idle_stop(unsigned long > psscr, bool mmu_on) > unsigned long mmcr0 = 0; > struct p9_sprs sprs; >
Re: [RFC PATCH v2 2/3] powernv/cpuidle: Pass pointers instead of values to stop loop
+ devicetree mail list On 10/11/18 06:22, Akshay Adiga wrote: > Passing pointer to the pnv_idle_state instead of psscr value and mask. > This helps us to pass more information to the stop loop. This will help to > figure out the method to enter/exit idle state. > > Signed-off-by: Akshay Adiga > > --- > Changes from v1 : > - Code is rebased on Nick Piggin's v4 patch "powerpc/64s: reimplement book3s >idle code in C" > > arch/powerpc/include/asm/processor.h | 5 ++- > arch/powerpc/platforms/powernv/idle.c | 47 ++- > drivers/cpuidle/cpuidle-powernv.c | 15 +++-- > 3 files changed, 24 insertions(+), 43 deletions(-) > > diff --git a/arch/powerpc/include/asm/processor.h > b/arch/powerpc/include/asm/processor.h > index 936795acba48..822d3236ad7f 100644 > --- a/arch/powerpc/include/asm/processor.h > +++ b/arch/powerpc/include/asm/processor.h > @@ -43,6 +43,7 @@ > #include > #include > #include > +#include > > /* We do _not_ want to define new machine types at all, those must die > * in favor of using the device-tree > @@ -518,9 +519,7 @@ enum idle_boot_override {IDLE_NO_OVERRIDE = 0, > IDLE_POWERSAVE_OFF}; > extern int powersave_nap;/* set if nap mode can be used in idle loop */ > > extern void power7_idle_type(unsigned long type); > -extern void power9_idle_type(unsigned long stop_psscr_val, > - unsigned long stop_psscr_mask); > - > +extern void power9_idle_type(struct pnv_idle_states_t *state); > extern void flush_instruction_cache(void); > extern void hard_reset_now(void); > extern void poweroff_now(void); > diff --git a/arch/powerpc/platforms/powernv/idle.c > b/arch/powerpc/platforms/powernv/idle.c > index 755918402591..681a23a066bb 100644 > --- a/arch/powerpc/platforms/powernv/idle.c > +++ b/arch/powerpc/platforms/powernv/idle.c > @@ -44,8 +44,7 @@ int nr_pnv_idle_states; > * The default stop state that will be used by ppc_md.power_save > * function on platforms that support stop instruction. > */ > -static u64 pnv_default_stop_val; > -static u64 pnv_default_stop_mask; > +struct pnv_idle_states_t *pnv_default_state; > static bool default_stop_found; > > /* > @@ -72,9 +71,7 @@ const int nr_known_versions = 1; > * psscr value and mask of the deepest stop idle state. > * Used when a cpu is offlined. > */ > -static u64 pnv_deepest_stop_psscr_val; > -static u64 pnv_deepest_stop_psscr_mask; > -static u64 pnv_deepest_stop_flag; > +static struct pnv_idle_states_t *pnv_deepest_state; > static bool deepest_stop_found; > > static unsigned long power7_offline_type; > @@ -96,7 +93,7 @@ static int pnv_save_sprs_for_deep_states(void) > uint64_t hid5_val = mfspr(SPRN_HID5); > uint64_t hmeer_val = mfspr(SPRN_HMEER); > uint64_t msr_val = MSR_IDLE; > - uint64_t psscr_val = pnv_deepest_stop_psscr_val; > + uint64_t psscr_val = pnv_deepest_state->psscr_val; > > for_each_present_cpu(cpu) { > uint64_t pir = get_hard_smp_processor_id(cpu); > @@ -820,17 +817,15 @@ static unsigned long power9_offline_stop(unsigned long > psscr) > return srr1; > } > > -static unsigned long __power9_idle_type(unsigned long stop_psscr_val, > - unsigned long stop_psscr_mask) > +static unsigned long __power9_idle_type(struct pnv_idle_states_t *state) > { > unsigned long psscr; > unsigned long srr1; > > if (!prep_irq_for_idle_irqsoff()) > return 0; > - > psscr = mfspr(SPRN_PSSCR); > - psscr = (psscr & ~stop_psscr_mask) | stop_psscr_val; > + psscr = (psscr & ~state->psscr_mask) | state->psscr_val; > > __ppc64_runlatch_off(); > srr1 = power9_idle_stop(psscr, true); > @@ -841,12 +836,10 @@ static unsigned long __power9_idle_type(unsigned long > stop_psscr_val, > return srr1; > } > > -void power9_idle_type(unsigned long stop_psscr_val, > - unsigned long stop_psscr_mask) > +void power9_idle_type(struct pnv_idle_states_t *state) > { > unsigned long srr1; > - > - srr1 = __power9_idle_type(stop_psscr_val, stop_psscr_mask); > + srr1 = __power9_idle_type(state); > irq_set_pending_from_srr1(srr1); > } > > @@ -855,7 +848,7 @@ void power9_idle_type(unsigned long stop_psscr_val, > */ > void power9_idle(void) > { > - power9_idle_type(pnv_default_stop_val, pnv_default_stop_mask); > + power9_idle_type(pnv_default_state); > } > > #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE > @@ -974,8 +967,8 @@ unsigned long pnv_cpu_offline(unsigned int cpu) > unsigned long psscr; > > psscr = mfspr(SPRN_PSSCR); > - psscr = (psscr & ~pnv_deepest_stop_psscr_mask) | > - pnv_deepest_stop_psscr_val; > + psscr = (psscr & ~pnv_deepest_state->psscr_mask) | > + pnv_deepest_state->psscr_val; > s
Re: [RFC PATCH v2 1/3] cpuidle/powernv: Add support for states with ibm,cpuidle-state-v1
+ devicetree mail list On 10/11/18 06:22, Akshay Adiga wrote: > This patch adds support for new device-tree format for idle state > description. > > Previously if a older kernel runs on a newer firmware, it may enable > all available states irrespective of its capability of handling it. > New device tree format adds a compatible flag, so that only kernel > which has the capability to handle the version of stop state will enable > it. > > Older kernel will still see stop0 and stop0_lite in older format and we > will depricate it after some time. > > 1) Idea is to bump up the version in firmware if we find a bug or > regression in stop states. A fix will be provided in linux which would > now know about the bumped up version of stop states, where as kernel > without fixes would ignore the states. > > 2) Slowly deprecate cpuidle /cpuhotplug threshold which is hard-coded > into cpuidle-powernv driver. Instead use compatible strings to indicate > if idle state is suitable for cpuidle and hotplug. > > New idle state device tree format : >power-mgt { > ... > ibm,enabled-stop-levels = <0xec00>; > ibm,cpu-idle-state-psscr-mask = <0x0 0x3003ff 0x0 0x3003ff>; > ibm,cpu-idle-state-latencies-ns = <0x3e8 0x7d0>; > ibm,cpu-idle-state-psscr = <0x0 0x330 0x0 0x300330>; > ibm,cpu-idle-state-flags = <0x10 0x101000>; > ibm,cpu-idle-state-residency-ns = <0x2710 0x4e20>; > ibm,idle-states { > stop4 { > flags = <0x207000>; > compatible = "ibm,state-v1", > "opal-supported"; > type = "cpuidle"; > psscr-mask = <0x0 0x3003ff>; > handle = <0x102>; > latency-ns = <0x186a0>; > residency-ns = <0x989680>; > psscr = <0x0 0x300374>; > }; > ... > stop11 { > ... > compatible = "ibm,state-v1", > "opal-supported"; > type = "cpuoffline"; > ... > }; > }; > type strings : > "cpuidle" : indicates it should be used by cpuidle-driver > "cpuoffline" : indicates it should be used by hotplug driver > > compatible strings : > "ibm,state-v1" : kernel checks if it knows about this version > "opal-supported" : indicates kernel can fall back to use opal > for stop-transitions > > Signed-off-by: Akshay Adiga > --- > > Changes from v1 : > - Code is rebased on Nick Piggin's v4 patch "powerpc/64s: reimplement book3s >idle code in C" > - Moved "cpuidle" and "cpuoffline" as seperate property called >"type" > > > arch/powerpc/include/asm/cpuidle.h| 9 ++ > arch/powerpc/platforms/powernv/idle.c | 132 +- > drivers/cpuidle/cpuidle-powernv.c | 31 -- > 3 files changed, 160 insertions(+), 12 deletions(-) > > diff --git a/arch/powerpc/include/asm/cpuidle.h > b/arch/powerpc/include/asm/cpuidle.h > index 9844b3ded187..e920a15e797f 100644 > --- a/arch/powerpc/include/asm/cpuidle.h > +++ b/arch/powerpc/include/asm/cpuidle.h > @@ -70,14 +70,23 @@ > > #ifndef __ASSEMBLY__ > > +enum idle_state_type_t { > + CPUIDLE_TYPE, > + CPUOFFLINE_TYPE > +}; > + > +#define POWERNV_THRESHOLD_LATENCY_NS 20 > +#define PNV_VER_NAME_LEN32 > #define PNV_IDLE_NAME_LEN16 > struct pnv_idle_states_t { > char name[PNV_IDLE_NAME_LEN]; > + char version[PNV_VER_NAME_LEN]; > u32 latency_ns; > u32 residency_ns; > u64 psscr_val; > u64 psscr_mask; > u32 flags; > + enum idle_state_type_t type; > bool valid; > }; > > diff --git a/arch/powerpc/platforms/powernv/idle.c > b/arch/powerpc/platforms/powernv/idle.c > index 96186af9e953..755918402591 100644 > --- a/arch/powerpc/platforms/powernv/idle.c > +++ b/arch/powerpc/platforms/powernv/idle.c > @@ -54,6 +54,20 @@ static bool default_stop_found; > static u64 pnv_first_tb_loss_level = MAX_STOP_STATE + 1; > static u64 pnv_first_hv_loss_level = MAX_STOP_STATE + 1; > > + > +static int parse_dt_v1(struct device_node *np); > +struct stop_version_t { > + const char name[PNV_VER_NAME_LEN]; > + int (*parser_fn)(struct device_node *np); > +}; > +struct stop_version_t known_versions[] = { > + { > + .name = "ibm,state-v1", > + .parser_fn = parse_dt_v1, > + } > + }; > +const int nr_known_versions = 1; > + > /* > * psscr value and mask of the deepest stop idle state. > * Used when a cpu is offlined. > @@ -1195,6 +1209,77 @@ static void __init pnv_probe_idle_states(void) > supported_cpuidle_states |= pnv_idle_states[i].flags; > } > > +static int parse_dt
Re: [RFC PATCH v2 0/3] New device-tree format and Opal based idle save-restore
+ devicetree mail list On 10/11/18 06:22, Akshay Adiga wrote: > Previously if a older kernel runs on a newer firmware, it may enable > all available states irrespective of its capability of handling it. > New device tree format adds a compatible flag, so that only kernel > which has the capability to handle the version of stop state will enable > it. > > Older kernel will still see stop0 and stop0_lite in older format and we > will depricate it after some time. > > 1) Idea is to bump up the version string in firmware if we find a bug or > regression in stop states. A fix will be provided in linux which would > now know about the bumped up version of stop states, where as kernel > without fixes would ignore the states. > > 2) Slowly deprecate cpuidle/cpuhotplug threshold which is hard-coded > into cpuidle-powernv driver. Instead use compatible strings to indicate > if idle state is suitable for cpuidle and hotplug. > > New idle state device tree format : >power-mgt { > ... > ibm,enabled-stop-levels = <0xec00>; > ibm,cpu-idle-state-psscr-mask = <0x0 0x3003ff 0x0 0x3003ff>; > ibm,cpu-idle-state-latencies-ns = <0x3e8 0x7d0>; > ibm,cpu-idle-state-psscr = <0x0 0x330 0x0 0x300330>; > ibm,cpu-idle-state-flags = <0x10 0x101000>; > ibm,cpu-idle-state-residency-ns = <0x2710 0x4e20>; > ibm,idle-states { > stop4 { > flags = <0x207000>; > compatible = "ibm,state-v1", > "opal-support"; >type = "cpuidle"; > psscr-mask = <0x0 0x3003ff>; > handle = <0x102>; > latency-ns = <0x186a0>; > residency-ns = <0x989680>; > psscr = <0x0 0x300374>; > }; > ... > stop11 { > ... > compatible = "ibm,state-v1", > "opal-support"; >type = "cpuoffline"; > ... > }; > }; > > High-level parsing algorithm : > > Say Known version string = "ibm,state-v1" > > for each stop state node in device tree: > if (compatible has known version string) > kernel takes care of stop-transitions > else if (compatible has "opal-support") > OPAL takes care of stop-transitions > else > Skip All deeper states > > When a state does not have both version support and opal support, > Its possible to exit from a shallower state. Hence skipping all > deeper states. > > OPAL support for idle states > > > With this patch series, all the states that loose hypervisor state > will be handled through opal_call. > > Patch 3 adds support for Saving/restoring of SPRs and resync-timebase > in OPAL. Also all the decision making such as identifying first thread > in the core and taking locks before restoring, etc are implemented in > OPAL. > > How does it work ? > --- > > Consider a case that stop4 has a bug. We take the following steps to > mitigate the problem. > > 1) Change compatible string for stop4 in OPAL to "ibm-state-v2" and > remove "opal-supported". ship the new firmware. > The kernel ignores stop4 and all deeper states. But we will still have > shallower states. Prevents from completely disabling stop states. > > 2) Implement workaround in OPAL and add "opal-supported". Ship new firmware > The kernel uses opal for stop-transtion , which has workaround implemented. > We get stop4 and deeper states working without kernel changes and backports. > (and considerably less time) > > 3) Implement workaround in kernel and add "ibm-state-v2" as known versions > The kernel will now be able to handle stop4 and deeper states. > > Changes from v1 : > - Code is rebased on Nick Piggin's v4 patch "powerpc/64s: reimplement book3s >idle code in C" > http://patchwork.ozlabs.org/patch/969596/ > - All the states that loses hypervisor states will be handled by OPAL > - All the decision making such as identifying first thread in >the core and taking locks before restoring in such cases have also been >moved to OPAL > > > Abhishek Goel (1): > cpuidle/powernv: save-restore sprs in opal > > Akshay Adiga (2): > cpuidle/powernv: Add support for states with ibm,cpuidle-state-v1 > powernv/cpuidle: Pass pointers instead of values to stop loop > > arch/powerpc/include/asm/cpuidle.h| 9 + > arch/powerpc/include/asm/opal-api.h | 4 +- > arch/powerpc/include/asm/opal.h | 3 + > arch/powerpc/include/asm/processor.h | 8 +- > arch/powerpc/kernel/idle_book3s.S | 6 +- > arch/powerpc/platforms/powernv/idle.c | 247 ++---
Re: [PATCH 05.1/16] of:overlay: missing name, phandle, linux, phandle in new nodes
On 10/10/18 14:03, Frank Rowand wrote: > On 10/10/18 13:40, Alan Tull wrote: >> On Wed, Oct 10, 2018 at 1:49 AM Frank Rowand wrote: >>> >>> On 10/09/18 23:04, frowand.l...@gmail.com wrote: >>>> From: Frank Rowand >>>> >>>> >>>> "of: overlay: use prop add changeset entry for property in new nodes" >>>> fixed a problem where an 'update property' changeset entry was >>>> created for properties contained in nodes added by a changeset. >>>> The fix was to use an 'add property' changeset entry. >>>> >>>> This exposed more bugs in the apply overlay code. The properties >>>> 'name', 'phandle', and 'linux,phandle' were filtered out by >>>> add_changeset_property() as special properties. Change the filter >>>> to be only for existing nodes, not newly added nodes. >>>> >>>> The second bug is that the 'name' property does not exist in the >>>> newest FDT version, and has to be constructed from the node's >>>> full_name. Construct an 'add property' changeset entry for >>>> newly added nodes. >>>> >>>> Signed-off-by: Frank Rowand >>>> --- >>>> >>>> >>>> Hi Alan, >>>> >>>> Thanks for reporting the problem with missing node names. >>>> >>>> I was able to replicate the problem, and have created this preliminary >>>> version of a patch to fix the problem. >>>> >>>> I have not extensively reviewed the patch yet, but would appreciate >>>> if you can confirm this fixes your problem. >>>> >>>> I created this patch as patch 17 of the series, but have also >>>> applied it as patch 05.1, immediately after patch 05/16, and >>>> built the kernel, booted, and verified name and phandle for >>>> one of the nodes in a unittest overlay for both cases. So >>>> minimal testing so far on my part. >>>> >>>> I have not verified whether the series builds and boots after >>>> each of patches 06..16 if this patch is applied as patch 05.1. >>>> >>>> There is definitely more work needed for me to complete this >>>> patch because it allocates some more memory, but does not yet >>>> free it when the overlay is released. >>>> >>>> -Frank >>>> >>>> >>>> drivers/of/overlay.c | 72 >>>> >>>> 1 file changed, 67 insertions(+), 5 deletions(-) >>>> >>>> diff --git a/drivers/of/overlay.c b/drivers/of/overlay.c >>>> index 0b0904f44bc7..9746cea2aa91 100644 >>>> --- a/drivers/of/overlay.c >>>> +++ b/drivers/of/overlay.c >>>> @@ -301,10 +301,11 @@ static int add_changeset_property(struct >>>> overlay_changeset *ovcs, >>>> struct property *new_prop = NULL, *prop; >>>> int ret = 0; >>>> >>>> - if (!of_prop_cmp(overlay_prop->name, "name") || >>>> - !of_prop_cmp(overlay_prop->name, "phandle") || >>>> - !of_prop_cmp(overlay_prop->name, "linux,phandle")) >>>> - return 0; >>>> + if (target->in_livetree) >>>> + if (!of_prop_cmp(overlay_prop->name, "name") || >>>> + !of_prop_cmp(overlay_prop->name, "phandle") || >>>> + !of_prop_cmp(overlay_prop->name, "linux,phandle")) >>>> + return 0; >>> >>> This is a big hammer patch. >>> >>> Nobody should waste time reviewing this patch. >> >> I wasn't clear if you still could use the testing so I did re-run my >> test. This patch adds back some of the missing properties, but the >> the kobject names aren't set as dev_name() returns NULL: >> >> * without this patch some of_node properties don't show up in sysfs: >> root@arria10:~# ls >> /sys/bus/platform/drivers/altera_freeze_br/ff200450.\/of_node >> clockscompatibleinterrupt-parent interruptsreg >> >> * with this patch, the of_node properties phandle and name are back: >> root@arria10:~# ls >> /sys/bus/platform/drivers/altera_freeze_br/ff200450.\/of_node >> clockscompatibleinterrupt-parent interrupts >>
Re: [PATCH 05.1/16] of:overlay: missing name, phandle, linux, phandle in new nodes
On 10/10/18 13:40, Alan Tull wrote: > On Wed, Oct 10, 2018 at 1:49 AM Frank Rowand wrote: >> >> On 10/09/18 23:04, frowand.l...@gmail.com wrote: >>> From: Frank Rowand >>> >>> >>> "of: overlay: use prop add changeset entry for property in new nodes" >>> fixed a problem where an 'update property' changeset entry was >>> created for properties contained in nodes added by a changeset. >>> The fix was to use an 'add property' changeset entry. >>> >>> This exposed more bugs in the apply overlay code. The properties >>> 'name', 'phandle', and 'linux,phandle' were filtered out by >>> add_changeset_property() as special properties. Change the filter >>> to be only for existing nodes, not newly added nodes. >>> >>> The second bug is that the 'name' property does not exist in the >>> newest FDT version, and has to be constructed from the node's >>> full_name. Construct an 'add property' changeset entry for >>> newly added nodes. >>> >>> Signed-off-by: Frank Rowand >>> --- >>> >>> >>> Hi Alan, >>> >>> Thanks for reporting the problem with missing node names. >>> >>> I was able to replicate the problem, and have created this preliminary >>> version of a patch to fix the problem. >>> >>> I have not extensively reviewed the patch yet, but would appreciate >>> if you can confirm this fixes your problem. >>> >>> I created this patch as patch 17 of the series, but have also >>> applied it as patch 05.1, immediately after patch 05/16, and >>> built the kernel, booted, and verified name and phandle for >>> one of the nodes in a unittest overlay for both cases. So >>> minimal testing so far on my part. >>> >>> I have not verified whether the series builds and boots after >>> each of patches 06..16 if this patch is applied as patch 05.1. >>> >>> There is definitely more work needed for me to complete this >>> patch because it allocates some more memory, but does not yet >>> free it when the overlay is released. >>> >>> -Frank >>> >>> >>> drivers/of/overlay.c | 72 >>> >>> 1 file changed, 67 insertions(+), 5 deletions(-) >>> >>> diff --git a/drivers/of/overlay.c b/drivers/of/overlay.c >>> index 0b0904f44bc7..9746cea2aa91 100644 >>> --- a/drivers/of/overlay.c >>> +++ b/drivers/of/overlay.c >>> @@ -301,10 +301,11 @@ static int add_changeset_property(struct >>> overlay_changeset *ovcs, >>> struct property *new_prop = NULL, *prop; >>> int ret = 0; >>> >>> - if (!of_prop_cmp(overlay_prop->name, "name") || >>> - !of_prop_cmp(overlay_prop->name, "phandle") || >>> - !of_prop_cmp(overlay_prop->name, "linux,phandle")) >>> - return 0; >>> + if (target->in_livetree) >>> + if (!of_prop_cmp(overlay_prop->name, "name") || >>> + !of_prop_cmp(overlay_prop->name, "phandle") || >>> + !of_prop_cmp(overlay_prop->name, "linux,phandle")) >>> + return 0; >> >> This is a big hammer patch. >> >> Nobody should waste time reviewing this patch. > > I wasn't clear if you still could use the testing so I did re-run my > test. This patch adds back some of the missing properties, but the > the kobject names aren't set as dev_name() returns NULL: > > * without this patch some of_node properties don't show up in sysfs: > root@arria10:~# ls > /sys/bus/platform/drivers/altera_freeze_br/ff200450.\/of_node > clockscompatibleinterrupt-parent interruptsreg > > * with this patch, the of_node properties phandle and name are back: > root@arria10:~# ls > /sys/bus/platform/drivers/altera_freeze_br/ff200450.\/of_node > clockscompatibleinterrupt-parent interrupts > name phandle reg Thanks for the testing. I'll keep chasing after this problem today. This is useful data for me as I was not looking under the /sys/bus/... tree that you reported, but was instead looking at /proc/device-tree/... which showed the same type of problem since the overlay I was using does not show up under /sys/bus/... I'll have to create a u
Re: [PATCH 05.1/16] of:overlay: missing name, phandle, linux, phandle in new nodes
On 10/09/18 23:04, frowand.l...@gmail.com wrote: > From: Frank Rowand > > > "of: overlay: use prop add changeset entry for property in new nodes" > fixed a problem where an 'update property' changeset entry was > created for properties contained in nodes added by a changeset. > The fix was to use an 'add property' changeset entry. > > This exposed more bugs in the apply overlay code. The properties > 'name', 'phandle', and 'linux,phandle' were filtered out by > add_changeset_property() as special properties. Change the filter > to be only for existing nodes, not newly added nodes. > > The second bug is that the 'name' property does not exist in the > newest FDT version, and has to be constructed from the node's > full_name. Construct an 'add property' changeset entry for > newly added nodes. > > Signed-off-by: Frank Rowand > --- > > > Hi Alan, > > Thanks for reporting the problem with missing node names. > > I was able to replicate the problem, and have created this preliminary > version of a patch to fix the problem. > > I have not extensively reviewed the patch yet, but would appreciate > if you can confirm this fixes your problem. > > I created this patch as patch 17 of the series, but have also > applied it as patch 05.1, immediately after patch 05/16, and > built the kernel, booted, and verified name and phandle for > one of the nodes in a unittest overlay for both cases. So > minimal testing so far on my part. > > I have not verified whether the series builds and boots after > each of patches 06..16 if this patch is applied as patch 05.1. > > There is definitely more work needed for me to complete this > patch because it allocates some more memory, but does not yet > free it when the overlay is released. > > -Frank > > > drivers/of/overlay.c | 72 > > 1 file changed, 67 insertions(+), 5 deletions(-) > > diff --git a/drivers/of/overlay.c b/drivers/of/overlay.c > index 0b0904f44bc7..9746cea2aa91 100644 > --- a/drivers/of/overlay.c > +++ b/drivers/of/overlay.c > @@ -301,10 +301,11 @@ static int add_changeset_property(struct > overlay_changeset *ovcs, > struct property *new_prop = NULL, *prop; > int ret = 0; > > - if (!of_prop_cmp(overlay_prop->name, "name") || > - !of_prop_cmp(overlay_prop->name, "phandle") || > - !of_prop_cmp(overlay_prop->name, "linux,phandle")) > - return 0; > + if (target->in_livetree) > + if (!of_prop_cmp(overlay_prop->name, "name") || > + !of_prop_cmp(overlay_prop->name, "phandle") || > + !of_prop_cmp(overlay_prop->name, "linux,phandle")) > + return 0; This is a big hammer patch. Nobody should waste time reviewing this patch. The following part should not be needed (though the above section might have to become _slightly_ more complex). -Frank > > if (target->in_livetree) > prop = of_find_property(target->np, overlay_prop->name, NULL); > @@ -443,10 +444,13 @@ static int build_changeset_next_level(struct > overlay_changeset *ovcs, > struct target *target, const struct device_node *overlay_node) > { > struct device_node *child; > - struct property *prop; > + struct property *prop, *name_prop; > + bool has_name = false; > int ret; > > for_each_property_of_node(overlay_node, prop) { > + if (!strcmp(prop->name, "name")) > + has_name = true; > ret = add_changeset_property(ovcs, target, prop, 0); > if (ret) { > pr_debug("Failed to apply prop @%pOF/%s, err=%d\n", > @@ -455,6 +459,57 @@ static int build_changeset_next_level(struct > overlay_changeset *ovcs, > } > } > > + /* > + * With FDT version 0x10 we may not have the name property, > + * recreate it here from the unit name if absent > + */ > + > + if (!has_name) { > + const char *p = target->np->full_name, *ps = p, *pa = NULL; > + int len; > + > + /* > + * zzz > + * TODO: stash name_prop on a list in ovcs, to be freed > + * after overlay removed > + */ > + > + while (*p) { > + if ((*p) == '@') > + pa = p; > + else if ((*p) == '
Re: [PATCH 05/16] of: overlay: use prop add changeset entry for property in new nodes
On 10/09/18 13:28, Alan Tull wrote: > On Thu, Oct 4, 2018 at 11:14 PM wrote: >> >> From: Frank Rowand >> > > Hi Frank, > >> The changeset entry 'update property' was used for new properties in >> an overlay instead of 'add property'. >> >> The decision of whether to use 'update property' was based on whether >> the property already exists in the subtree where the node is being >> spliced into. At the top level of creating a changeset describing the >> overlay, the target node is in the live devicetree, so checking whether >> the property exists in the target node returns the correct result. >> As soon as the changeset creation algorithm recurses into a new node, >> the target is no longer in the live devicetree, but is instead in the >> detached overlay tree, thus all properties are incorrectly found to >> already exist in the target. > > When I applied an overlay (that added a few gpio controllers, etc), > the node names for nodes added from the overlay end up NULL. It I'll look at this tonight. -Frank > seems related to this patch and the next one. I haven't completely > root caused this but if I back out to before this patch, the situation > is fixed. > > root@arria10:~/unit_tests# ls /sys/bus/platform/drivers/altera_gpio/ > bind ff200010. ff200020. ff200030. > uevent unbind > > root@arria10:~/unit_tests# ls /sys/bus/platform/drivers/altera_freeze_br/ > bind ff200450. uevent unbind > > Alan > >> >> This fix will expose another devicetree bug that will be fixed >> in the following patch in the series. >> >> When this patch is applied the errors reported by the devictree >> unittest will change, and the unittest results will change from: >> >>### dt-test ### end of unittest - 210 passed, 0 failed >> >> to >> >>### dt-test ### end of unittest - 203 passed, 7 failed >> >> Signed-off-by: Frank Rowand >> --- >> drivers/of/overlay.c | 112 >> ++- >> 1 file changed, 74 insertions(+), 38 deletions(-) >> >> diff --git a/drivers/of/overlay.c b/drivers/of/overlay.c >> index 32cfee68f2e3..0b0904f44bc7 100644 >> --- a/drivers/of/overlay.c >> +++ b/drivers/of/overlay.c >> @@ -24,6 +24,26 @@ >> #include "of_private.h" >> >> /** >> + * struct target - info about current target node as recursing through >> overlay >> + * @np:node where current level of overlay will be >> applied >> + * @in_livetree: @np is a node in the live devicetree >> + * >> + * Used in the algorithm to create the portion of a changeset that describes >> + * an overlay fragment, which is a devicetree subtree. Initially @np is a >> node >> + * in the live devicetree where the overlay subtree is targeted to be >> grafted >> + * into. When recursing to the next level of the overlay subtree, the >> target >> + * also recurses to the next level of the live devicetree, as long as >> overlay >> + * subtree node also exists in the live devicetree. When a node in the >> overlay >> + * subtree does not exist at the same level in the live devicetree, >> target->np >> + * points to a newly allocated node, and all subsequent targets in the >> subtree >> + * will be newly allocated nodes. >> + */ >> +struct target { >> + struct device_node *np; >> + bool in_livetree; >> +}; >> + >> +/** >> * struct fragment - info about fragment nodes in overlay expanded device >> tree >> * @target:target of the overlay operation >> * @overlay: pointer to the __overlay__ node >> @@ -72,8 +92,7 @@ static int devicetree_corrupt(void) >> } >> >> static int build_changeset_next_level(struct overlay_changeset *ovcs, >> - struct device_node *target_node, >> - const struct device_node *overlay_node); >> + struct target *target, const struct device_node >> *overlay_node); >> >> /* >> * of_resolve_phandles() finds the largest phandle in the live tree. >> @@ -257,14 +276,17 @@ static struct property *dup_and_fixup_symbol_prop( >> /** >> * add_changeset_property() - add @overlay_prop to overlay changeset >> * @ovcs: overlay changeset >> - * @target_node: where to place @overlay_prop in live tree >> + * @target:where @overlay_prop will be placed >> * @ov
Re: [PATCH 09/16] of: overlay: validate overlay properties #address-cells and #size-cells
On 10/08/18 11:46, Alan Tull wrote: > On Mon, Oct 8, 2018 at 10:57 AM Alan Tull wrote: >> >> On Thu, Oct 4, 2018 at 11:14 PM wrote: >>> >>> From: Frank Rowand >>> >>> If overlay properties #address-cells or #size-cells are already in >>> the live devicetree for any given node, then the values in the >>> overlay must match the values in the live tree. >> >> Hi Frank, >> >> I'm starting some FPGA testing on this patchset applied to v4.19-rc7. >> That applied cleanly; if that's not the best base to test against, >> please let me know. I would expect -rc7 to be ok to test against. I'm doing the development of it on -rc1. Thanks for the testing. >> On a very simple overlay, I'm seeing this patch's warning catching >> things other than #address-cells or #size-cells. #address-cells and #size-cells escape the warning for properties on an existing (non-overlay) node if the existing node already contains them as a special case. Those two properties are needed in the overlay to avoid dtc compiler warnings. If the same properties already exist in the base devicetree and have the same values as in the overlay then there is no need to add property update changeset entries in the overlay changeset. Since there will not be changeset entries for those two properties, there will be no memory leak when the changeset is removed. The special casing of #address-cells and #size-cells is part of the fix patches that are a result of the validation patches. Thus a little bit less memory leaking than we have today. > What it's warning about are new properties being added to an existing > node. So !prop is true and !of_node_check_flag(target->np, > OF_OVERLAY) also is true. Is that a potential memory leak as you are > warning? If so, your code is working as planned and you'll just need > to document that also in the header. Yes, you are accurately describing what the check is catching. The memory leak (on release) is because the memory allocated for overlay properties is released when the reference count of the node they are attached is decremented to zero, but only if the node is a dynamic flagged node (as overlays are). The memory allocated for the overlay properties will not be freed in this case because the node is not a dynamic node. >> I'm just getting >> started looking at this, will spend time understanding this better and >> I'll test other overlays. The warnings were: >> >> Applying dtbo: socfpga_overlay.dtb >> [ 33.117881] fpga_manager fpga0: writing soc_system.rbf to Altera >> SOCFPGA FPGA Manager >> [ 33.575223] OF: overlay: WARNING: add_changeset_property(), memory >> leak will occur if overlay removed. Property: >> /soc/base-fpga-region/firmware-name >> [ 33.588584] OF: overlay: WARNING: add_changeset_property(), memory >> leak will occur if overlay removed. Property: >> /soc/base-fpga-region/fpga-bridges >> [ 33.601856] OF: overlay: WARNING: add_changeset_property(), memory >> leak will occur if overlay removed. Property: >> /soc/base-fpga-region/ranges Are there properties in /soc/base-fpga-region/ in the base devicetree? If not, then that node could be removed from the base devicetree and first created in an overlay. If so, is it possible to add an additional level of node, /soc/base-fpga-region/foo, which would contain the properties that are warned about above? Then the properties would be children of an overlay node and the memory would be freed on overlay release. This is not actually a suggestion that should be implemented right now, just trying to understand the possible alternatives, because this would result in an arbitrary fake level in the tree (which I don't like). My intent is to leave these validation checks as warnings while we figure out the best way to solve the underlying memory leak issue. Note that some of the validation checks result in errors and cause an overlay apply to fail. If I did those checks correctly, they should only catch cases where the live tree after applying the overlay was a "corrupt" tree instead of the desired changes. I expect that Plumbers will be a good place to explore these things. >> Here's part of that overlay including the properties it's complaining about: >> >> /dts-v1/; >> /plugin/; >> / { >> fragment@0 { >> target = <&base_fpga_region>; >> #address-cells = <1>; >> #size-cells = <1>; >> __overlay__ { >> #address-cells = <1>; >> #size-cells = <1>; >> >> firmware-
Re: [PATCH 09/16] of: overlay: validate overlay properties #address-cells and #size-cells
On 10/05/18 12:04, Rob Herring wrote: > On Fri, Oct 5, 2018 at 1:53 PM Frank Rowand wrote: >> >> On 10/05/18 08:07, Rob Herring wrote: >>> On Thu, Oct 4, 2018 at 11:14 PM wrote: >>>> >>>> From: Frank Rowand >>>> >>>> If overlay properties #address-cells or #size-cells are already in >>>> the live devicetree for any given node, then the values in the >>>> overlay must match the values in the live tree. >>>> >>>> If the properties are already in the live tree then there is no >>>> need to create a changeset entry to add them since they must >>>> have the same value. This reduces the memory used by the >>>> changeset and eliminates a possible memory leak. This is >>>> verified by 12 fewer warnings during the devicetree unittest, >>>> as the possible memory leak warnings about #address-cells and >>> >>> and...? >> >> #size-cells no longer occur. >> >> (Thanks for catching that.) >> >> >>>> >>>> Signed-off-by: Frank Rowand >>>> --- >>>> drivers/of/overlay.c | 38 +++--- >>>> 1 file changed, 35 insertions(+), 3 deletions(-) >>>> >>>> diff --git a/drivers/of/overlay.c b/drivers/of/overlay.c >>>> index 29c33a5c533f..e6fb3ffe9d93 100644 >>>> --- a/drivers/of/overlay.c >>>> +++ b/drivers/of/overlay.c >>>> @@ -287,7 +287,12 @@ static struct property *dup_and_fixup_symbol_prop( >>>> * @target may be either in the live devicetree or in a new subtree that >>>> * is contained in the changeset. >>>> * >>>> - * Some special properties are not updated (no error returned). >>>> + * Some special properties are not added or updated (no error returned): >>>> + * "name", "phandle", "linux,phandle". >>>> + * >>>> + * Properties "#address-cells" and "#size-cells" are not updated if they >>>> + * are already in the live tree, but if present in the live tree, the >>>> values >>>> + * in the overlay must match the values in the live tree. >>> >>> Perhaps this should be generalized to apply to any property? We can't >>> really deal with property values changing on the fly anyways. >> >> That is a bigger discussion. I'd prefer to not hold up this series for that >> question to be resolved. It will be easy enough to generalize in an add-on >> patch later. > > Fair enough. > >>>> + if (prop->length != 4 || new_prop->length != 4 || >>>> + *(u32 *)prop->value != *(u32 *)new_prop->value) >>> >>> Technically these are __be32 types. This could use a helper >>> (of_prop_val_eq). >> >> These are in a unpacked form, so cpu byte order, not FDT byte order. > > You sure about that? Unpacking doesn't change the order. It can't > because the type is unknown. The value of 'value' is the address of > the data in the FDT. Aargh. You are totally right. >>> I'm not sure we really need to validate the length here as dtc does >>> that (but yes, not everything is from dtc). >> >> Since I'm accessing 4 bytes of the values, I need to be sure the lengths >> are at least 4. For #address-cells and #size-cells the property is >> specified as four bytes, so I could simplify the code for the specific case. >> >> If this gets extended to any arbitrary property then a new of_prop_val_eq() >> would check that the lengths are equal and the values (of size length) are >> also equal. > > Right, that's what I was thinking. Check lengths are equal and then > you can just do a memcmp(). Based on all of this it seems better that I create of_prop_val_eq(), as you suggested, and change to use that. > > Rob >
Re: [PATCH 15/16] of: unittest: initialize args before calling of_irq_parse_one()
On 10/05/18 06:26, Guenter Roeck wrote: > On 10/04/2018 09:12 PM, frowand.l...@gmail.com wrote: >> From: Frank Rowand >> >> Callers of of_irq_parse_one() blindly use the pointer args.np >> without checking whether of_irq_parse_one() had an error and >> thus did not set the value of args.np. Initialize args to >> zero so that using the format "%pOF" to show the value of >> args.np will show "(null)" when of_irq_parse_one() has an >> error and does not set args.np instead of trying to >> dereference a random value. >> >> Reported-by: Guenter Roeck >> Signed-off-by: Frank Rowand > > Reviewed-by: Guenter Roeck > > The same problem exists when of_parse_phandle_with_args() reports an error. Thanks, I'll add a fix for that. -Frank > > Guenter > >> --- >> drivers/of/unittest.c | 5 +++-- >> 1 file changed, 3 insertions(+), 2 deletions(-) >> >> diff --git a/drivers/of/unittest.c b/drivers/of/unittest.c >> index 6d80f474c8f2..b61a33f30a56 100644 >> --- a/drivers/of/unittest.c >> +++ b/drivers/of/unittest.c >> @@ -780,7 +780,7 @@ static void __init of_unittest_parse_interrupts(void) >> for (i = 0; i < 4; i++) { >> bool passed = true; >> - args.args_count = 0; >> + memset(&args, 0, sizeof(args)); >> rc = of_irq_parse_one(np, i, &args); >> passed &= !rc; >> @@ -801,7 +801,7 @@ static void __init of_unittest_parse_interrupts(void) >> for (i = 0; i < 4; i++) { >> bool passed = true; >> - args.args_count = 0; >> + memset(&args, 0, sizeof(args)); >> rc = of_irq_parse_one(np, i, &args); >> /* Test the values from tests-phandle.dtsi */ >> @@ -854,6 +854,7 @@ static void __init >> of_unittest_parse_interrupts_extended(void) >> for (i = 0; i < 7; i++) { >> bool passed = true; >> + memset(&args, 0, sizeof(args)); >> rc = of_irq_parse_one(np, i, &args); >> /* Test the values from tests-phandle.dtsi */ >> > >
Re: [PATCH 15/16] of: unittest: initialize args before calling of_irq_parse_one()
On 10/05/18 07:53, Rob Herring wrote: > On Thu, Oct 4, 2018 at 11:14 PM wrote: >> >> From: Frank Rowand >> >> Callers of of_irq_parse_one() blindly use the pointer args.np >> without checking whether of_irq_parse_one() had an error and >> thus did not set the value of args.np. Initialize args to >> zero so that using the format "%pOF" to show the value of >> args.np will show "(null)" when of_irq_parse_one() has an >> error and does not set args.np instead of trying to >> dereference a random value. >> >> Reported-by: Guenter Roeck >> Signed-off-by: Frank Rowand >> --- >> drivers/of/unittest.c | 5 +++-- >> 1 file changed, 3 insertions(+), 2 deletions(-) > > Does this need to be part of this series? I do not know if it could be independently applied before the rest of the series (I have not tested that). I included it in the series because the series has so many other changes to unittest.c. If you want me to break this out, I will remove it from this series and resend it after the rest of the series has been pulled to mainline (and rebase on the new mainline). This patch is not fixing a known failure case - the current test data does not trigger the problem. The recent patch from Guenter that you already accepted fixes the known failure case, so this patch is not urgent. The same is true about the other cases Guenter pointed out that this patch should fix. -Frank
Re: [PATCH 09/16] of: overlay: validate overlay properties #address-cells and #size-cells
On 10/05/18 08:07, Rob Herring wrote: > On Thu, Oct 4, 2018 at 11:14 PM wrote: >> >> From: Frank Rowand >> >> If overlay properties #address-cells or #size-cells are already in >> the live devicetree for any given node, then the values in the >> overlay must match the values in the live tree. >> >> If the properties are already in the live tree then there is no >> need to create a changeset entry to add them since they must >> have the same value. This reduces the memory used by the >> changeset and eliminates a possible memory leak. This is >> verified by 12 fewer warnings during the devicetree unittest, >> as the possible memory leak warnings about #address-cells and > > and...? #size-cells no longer occur. (Thanks for catching that.) >> >> Signed-off-by: Frank Rowand >> --- >> drivers/of/overlay.c | 38 +++--- >> 1 file changed, 35 insertions(+), 3 deletions(-) >> >> diff --git a/drivers/of/overlay.c b/drivers/of/overlay.c >> index 29c33a5c533f..e6fb3ffe9d93 100644 >> --- a/drivers/of/overlay.c >> +++ b/drivers/of/overlay.c >> @@ -287,7 +287,12 @@ static struct property *dup_and_fixup_symbol_prop( >> * @target may be either in the live devicetree or in a new subtree that >> * is contained in the changeset. >> * >> - * Some special properties are not updated (no error returned). >> + * Some special properties are not added or updated (no error returned): >> + * "name", "phandle", "linux,phandle". >> + * >> + * Properties "#address-cells" and "#size-cells" are not updated if they >> + * are already in the live tree, but if present in the live tree, the values >> + * in the overlay must match the values in the live tree. > > Perhaps this should be generalized to apply to any property? We can't > really deal with property values changing on the fly anyways. That is a bigger discussion. I'd prefer to not hold up this series for that question to be resolved. It will be easy enough to generalize in an add-on patch later. >> * >> * Update of property in symbols node is not allowed. >> * >> @@ -300,6 +305,7 @@ static int add_changeset_property(struct >> overlay_changeset *ovcs, >> { >> struct property *new_prop = NULL, *prop; >> int ret = 0; >> + bool check_for_non_overlay_node = false; >> >> if (!of_prop_cmp(overlay_prop->name, "name") || >> !of_prop_cmp(overlay_prop->name, "phandle") || >> @@ -322,13 +328,39 @@ static int add_changeset_property(struct >> overlay_changeset *ovcs, >> if (!new_prop) >> return -ENOMEM; >> >> - if (!prop) >> + if (!prop) { >> + > > Remove the extra blank lines. Will do. > >> + check_for_non_overlay_node = true; >> ret = of_changeset_add_property(&ovcs->cset, target->np, >> new_prop); >> - else >> + >> + } else if (!of_prop_cmp(prop->name, "#address-cells")) { >> + >> + if (prop->length != 4 || new_prop->length != 4 || >> + *(u32 *)prop->value != *(u32 *)new_prop->value) > > Technically these are __be32 types. This could use a helper (of_prop_val_eq). These are in a unpacked form, so cpu byte order, not FDT byte order. > > I'm not sure we really need to validate the length here as dtc does > that (but yes, not everything is from dtc). Since I'm accessing 4 bytes of the values, I need to be sure the lengths are at least 4. For #address-cells and #size-cells the property is specified as four bytes, so I could simplify the code for the specific case. If this gets extended to any arbitrary property then a new of_prop_val_eq() would check that the lengths are equal and the values (of size length) are also equal. > >> + pr_err("ERROR: overlay and/or live tree >> #address-cells invalid in node %pOF\n", >> + target->np); >> + >> + } else if (!of_prop_cmp(prop->name, "#size-cells")) { >> + >> + if (prop->length != 4 || new_prop->length != 4 || >> + *(u32 *)prop->value != *(u32 *)new_prop->value) >> + pr_err("ERROR: overlay and/or live tree #size-cells >> invalid in node %pOF\n", >> + target->np); >> + >> + } else { >> + >> + check_for_non_overlay_node = true; >> ret = of_changeset_update_property(&ovcs->cset, target->np, >>new_prop); >> >> + } >> + >> + if (check_for_non_overlay_node && >> + !of_node_check_flag(target->np, OF_OVERLAY)) >> + pr_err("WARNING: %s(), memory leak will occur if overlay >> removed. Property: %pOF/%s\n", >> + __func__, target->np, new_prop->name); >> + >> if (ret) { >> kfree(new_prop->name); >> kfree(new_prop->value); >> -- >> Frank Rowand >> >
Re: drivers binding to device node with multiple compatible strings
+ Frank On 09/27/18 15:25, Li Yang wrote: > Hi Rob and Grant, > > Various device tree specs are recommending to include all the > potential compatible strings in the device node, with the order from > most specific to most general. But it looks like Linux kernel doesn't > provide a way to bind the device to the most specific driver, however, > the first registered compatible driver will be bound. > > As more and more generic drivers are added to the Linux kernel, they > are competing with the more specific vendor drivers and causes problem > when both are built into the kernel. I'm wondering if there is a > generic solution (or in plan) to make the most specific driver bound > to the device. Or we have to disable the more general driver or > remove the more general compatible string from the device tree? > > Regards, > Leo >
Re: [PATCH 3/3] scripts/dtc: Update to upstream version v1.4.7-14-gc86da84d30e4
On 09/18/18 11:55, Rob Herring wrote: > On Fri, Sep 14, 2018 at 2:32 PM Frank Rowand wrote: >> >> On 09/13/18 13:28, Rob Herring wrote: >>> Major changes are I2C and SPI bus checks, YAML output format (for >>> future validation), some new libfdt functions, and more libfdt >>> validation of dtbs. >>> >>> The YAML addition adds an optional dependency on libyaml. pkg-config is >>> used to test for it and pkg-config became a kconfig dependency in 4.18. >> >> For Ubuntu, the libyaml dependency is provided by the packages: >> >>libyaml-0-2 >>libyaml-dev > > Yes, but as it is not yet required by anything in the kernel I don't > think that needs to be documented yet. Also, offhand, I don't think we > generally document in the kernel distro specifics like package names. > > Rob > Agreed. I was providing information that might save other people a bit of research. It is sufficiently visible in the email thread and does not need to be in the commit message.
Re: [PATCH 3/3] scripts/dtc: Update to upstream version v1.4.7-14-gc86da84d30e4
On 09/13/18 13:28, Rob Herring wrote: > Major changes are I2C and SPI bus checks, YAML output format (for > future validation), some new libfdt functions, and more libfdt > validation of dtbs. > > The YAML addition adds an optional dependency on libyaml. pkg-config is > used to test for it and pkg-config became a kconfig dependency in 4.18. For Ubuntu, the libyaml dependency is provided by the packages: libyaml-0-2 libyaml-dev -Frank > > This adds the following commits from upstream: > > c86da84d30e4 Add support for YAML encoded output > 361b5e7d8067 Make type_marker_length helper public < snip >
Re: [PATCH] of: fix phandle cache creation for DTs with no phandles
On 09/11/18 07:56, Rob Herring wrote: > With commit 0b3ce78e90fc ("of: cache phandle nodes to reduce cost of > of_find_node_by_phandle()"), a G3 PowerMac fails to boot. The root cause > is the DT for this system has no phandle properties when booted with > BootX. of_populate_phandle_cache() does not handle the case of no > phandles correctly. The problem is roundup_pow_of_two() for 0 is > undefined. The implementation subtracts 1 underflowing and then things > are in the weeds. > > Fixes: 0b3ce78e90fc ("of: cache phandle nodes to reduce cost of > of_find_node_by_phandle()") > Cc: sta...@vger.kernel.org # 4.17+ > Reported-by: Finn Thain > Tested-by: Stan Johnson > Cc: Frank Rowand > Cc: Benjamin Herrenschmidt > Signed-off-by: Rob Herring > --- > Here's a formal patch of what Stan tested. Will send to Linus this week. > > Rob > > drivers/of/base.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/drivers/of/base.c b/drivers/of/base.c > index a055cd1ef96d..17ae594b7014 100644 > --- a/drivers/of/base.c > +++ b/drivers/of/base.c > @@ -140,6 +140,9 @@ void of_populate_phandle_cache(void) > if (np->phandle && np->phandle != OF_PHANDLE_ILLEGAL) > phandles++; > > + if (!phandles) > + goto out; > + > cache_entries = roundup_pow_of_two(phandles); > phandle_cache_mask = cache_entries - 1; > > Reviewed-by: Frank Rowand
Re: v4.17 regression: PowerMac G3 won't boot, was Re: [PATCH v5 1/3] of: cache phandle nodes to reduce cost of of_find_node_by_phandle()
On 09/10/18 05:53, Rob Herring wrote: > On Sun, Sep 09, 2018 at 07:04:25PM +0200, Benjamin Herrenschmidt wrote: >> On Fri, 2018-08-31 at 14:58 +1000, Benjamin Herrenschmidt wrote: >>> >>>> A long shot, but something to consider, is that I failed to cover the >>>> cases of dynamic devicetree updates (removing nodes that contain a >>>> phandle) in ways other than overlays. Michael Ellerman has reported >>>> such a problem for powerpc/mobility with of_detach_node(). A patch to >>>> fix that is one of the tasks I need to complete. >>> >>> The only thing I can think of is booting via the BootX bootloader on >>> those ancient macs results in a DT with no phandles. I didn't see an >>> obvious reason why that would cause that patch to break though. >> >> Guys, we still don't have a fix for this one on its way upstream... >> >> My test patch just creates phandle properties for all nodes, that was >> not intended as a fix, more a way to check if the problem was related >> to the lack of phandles. >> >> I don't actually know why the new code causes things to fail when >> phandles are absent. This needs to be looked at. >> >> I'm travelling at the moment and generally caught up with other things, >> I haven't had a chance to dig, so just a heads up. I don't intend to >> submit my patch since it's just a band aid. We need to figure out what >> the actual problem is. > > Can you try this patch (w/o Ben's patch). I think the problem is if > there are no phandles, then roundup_pow_of_two is passed 0 which is > documented as undefined result. > > Though, if a DT has no properties with phandles, then why are we doing a > lookup in the first place? > > > 8<-- > > diff --git a/drivers/of/base.c b/drivers/of/base.c > index 9095b8290150..74eaedd5b860 100644 > --- a/drivers/of/base.c > +++ b/drivers/of/base.c > @@ -140,6 +140,9 @@ void of_populate_phandle_cache(void) > if (np->phandle && np->phandle != OF_PHANDLE_ILLEGAL) > phandles++; > > + if (!phandles) > + goto out; > + > cache_entries = roundup_pow_of_two(phandles); > phandle_cache_mask = cache_entries - 1; > > Thanks Rob! That fix makes sense, and the test results look promising. Reviewed-by: Frank Rowand
Re: v4.17 regression: PowerMac G3 won't boot, was Re: [PATCH v5 1/3] of: cache phandle nodes to reduce cost of of_find_node_by_phandle()
On 09/09/18 10:04, Benjamin Herrenschmidt wrote: > On Fri, 2018-08-31 at 14:58 +1000, Benjamin Herrenschmidt wrote: >> >>> A long shot, but something to consider, is that I failed to cover the >>> cases of dynamic devicetree updates (removing nodes that contain a >>> phandle) in ways other than overlays. Michael Ellerman has reported >>> such a problem for powerpc/mobility with of_detach_node(). A patch to >>> fix that is one of the tasks I need to complete. >> >> The only thing I can think of is booting via the BootX bootloader on >> those ancient macs results in a DT with no phandles. I didn't see an >> obvious reason why that would cause that patch to break though. > > Guys, we still don't have a fix for this one on its way upstream... > > My test patch just creates phandle properties for all nodes, that was > not intended as a fix, more a way to check if the problem was related > to the lack of phandles. > > I don't actually know why the new code causes things to fail when > phandles are absent. This needs to be looked at. > > I'm travelling at the moment and generally caught up with other things, > I haven't had a chance to dig, so just a heads up. I don't intend to > submit my patch since it's just a band aid. We need to figure out what > the actual problem is. > > Cheers, > Ben. Thanks for chasing after this, and the current heads up. I agree that we need to understand what is going on and do a real fix. -Frank
Re: v4.17 regression: PowerMac G3 won't boot, was Re: [PATCH v5 1/3] of: cache phandle nodes to reduce cost of of_find_node_by_phandle()
Hi Finn, On 08/29/18 17:44, Finn Thain wrote: > Hi Frank, > > Linux v4.17 and later will no longer boot on a G3 PowerMac. The boot hangs > very early, before any video driver loads. > > Stan and I were able to bisect the regression between v4.16 and v4.17 and > arrived at commit 0b3ce78e90fc ("of: cache phandle nodes to reduce cost of > of_find_node_by_phandle()"). > > I don't see any obvious bug in 0b3ce78e90fc or b9952b5218ad. But if you > revert these from v4.18 (which is also affected) that certainly resolves > the issue. > > I did see this in the kernel messages: > > Duplicate name in PowerPC,750, renamed to "l2-cache#1" > Duplicate name in mac-io, renamed to "ide#1" > Duplicate name in ide#1, renamed to "atapi-disk#1" > Duplicate name in multifunc-device, renamed to "pci1799,1#1" > > No idea whether that's relevant; I haven't done any further investigation. > Complete dmesg output is attached. Please let me know if there's any more > information you need to help find the bug. > > Thanks. I don't have any useful answers yet, but I am following the thread and have also quickly scanned the two commits for any obvious cause. I will look into this some more, but have a few other tasks that I need to complete first. A long shot, but something to consider, is that I failed to cover the cases of dynamic devicetree updates (removing nodes that contain a phandle) in ways other than overlays. Michael Ellerman has reported such a problem for powerpc/mobility with of_detach_node(). A patch to fix that is one of the tasks I need to complete. -Frank
Re: phandle_cache vs of_detach_node (was Re: [PATCH] powerpc/mobility: Fix node detach/rename problem)
On 07/31/18 12:18, Frank Rowand wrote: > On 07/31/18 07:17, Rob Herring wrote: >> On Tue, Jul 31, 2018 at 12:34 AM Michael Ellerman >> wrote: >>> >>> Hi Rob/Frank, >>> >>> I think we might have a problem with the phandle_cache not interacting >>> well with of_detach_node(): >> >> Probably needs a similar fix as this commit did for overlays: >> >> commit b9952b5218added5577e4a3443969bc20884cea9 >> Author: Frank Rowand >> Date: Thu Jul 12 14:00:07 2018 -0700 >> >> of: overlay: update phandle cache on overlay apply and remove >> >> A comment in the review of the patch adding the phandle cache said that >> the cache would have to be updated when modules are applied and removed. >> This patch implements the cache updates. >> >> Fixes: 0b3ce78e90fc ("of: cache phandle nodes to reduce cost of >> of_find_node_by_phandle()") >> Reported-by: Alan Tull >> Suggested-by: Alan Tull >> Signed-off-by: Frank Rowand >> Signed-off-by: Rob Herring > > Agreed. Sorry about missing the of_detach_node() case. > > >> Really what we need here is an "invalidate phandle" function rather >> than free and re-allocate the whole damn cache. > > The big hammer approach was chosen to avoid the race conditions that > would otherwise occur. OF does not have a locking strategy that > would be able to protect against the races. > > We could maybe implement a slightly smaller hammer by (1) disabling > the cache, (2) invalidate a phandle entry in the cache, (3) re-enable > the cache. That is an off the cuff thought - I would have to look > a little bit more carefully to make sure it would work. > > But I don't see a need to add the complexity of the smaller hammer > or the bigger hammer of proper locking _unless_ we start seeing that > the cache is being freed and re-allocated frequently. For overlays > I don't expect the high frequency because it happens on a per overlay > removal basis (not per node removal basis). > For of_detach_node() the > event _is_ on a per node removal basis. Michael, do you expect node > removals to be a frequent event with low latency being important? If > so, a rough guess on what the frequency would be? I have not looked at how of_detach_node() is used, so it might not be very different that overlays. If a group of of_detach_node() calls are made from a common code location, the the sequence could possibly be: of_free_phandle_cache() multiple calls of of_detach_node() of_populate_phandle_cache() -Frank > > -Frank > > >> Rob >> >>> >>> Michael Bringmann writes: >>>> See below. >>>> >>>> On 07/30/2018 01:31 AM, Michael Ellerman wrote: >>>>> Michael Bringmann writes: >>>>> >>>>>> During LPAR migration, the content of the device tree/sysfs may >>>>>> be updated including deletion and replacement of nodes in the >>>>>> tree. When nodes are added to the internal node structures, they >>>>>> are appended in FIFO order to a list of nodes maintained by the >>>>>> OF code APIs. >>>>> >>>>> That hasn't been true for several years. The data structure is an n-ary >>>>> tree. What kernel version are you working on? >>>> >>>> Sorry for an error in my description. I oversimplified based on the >>>> name of a search iterator. Let me try to provide a better explanation >>>> of the problem, here. >>>> >>>> This is the problem. The PPC mobility code receives RTAS requests to >>>> delete nodes with platform-/hardware-specific attributes when restarting >>>> the kernel after a migration. My example is for migration between a >>>> P8 Alpine and a P8 Brazos. Nodes to be deleted may include >>>> 'ibm,random-v1', >>>> 'ibm,compression-v1', 'ibm,platform-facilities', 'ibm,sym-encryption-v1', >>>> or others. >>>> >>>> The mobility.c code calls 'of_detach_node' for the nodes and their >>>> children. >>>> This makes calls to detach the properties and to try to remove the >>>> associated >>>> sysfs/kernfs files. >>>> >>>> Then new copies of the same nodes are next provided by the PHYP, local >>>> copies are built, and a pointer to the 'struct device_node' is passed to >>>>
Re: phandle_cache vs of_detach_node (was Re: [PATCH] powerpc/mobility: Fix node detach/rename problem)
On 07/31/18 07:17, Rob Herring wrote: > On Tue, Jul 31, 2018 at 12:34 AM Michael Ellerman wrote: >> >> Hi Rob/Frank, >> >> I think we might have a problem with the phandle_cache not interacting >> well with of_detach_node(): > > Probably needs a similar fix as this commit did for overlays: > > commit b9952b5218added5577e4a3443969bc20884cea9 > Author: Frank Rowand > Date: Thu Jul 12 14:00:07 2018 -0700 > > of: overlay: update phandle cache on overlay apply and remove > > A comment in the review of the patch adding the phandle cache said that > the cache would have to be updated when modules are applied and removed. > This patch implements the cache updates. > > Fixes: 0b3ce78e90fc ("of: cache phandle nodes to reduce cost of > of_find_node_by_phandle()") > Reported-by: Alan Tull > Suggested-by: Alan Tull > Signed-off-by: Frank Rowand > Signed-off-by: Rob Herring Agreed. Sorry about missing the of_detach_node() case. > Really what we need here is an "invalidate phandle" function rather > than free and re-allocate the whole damn cache. The big hammer approach was chosen to avoid the race conditions that would otherwise occur. OF does not have a locking strategy that would be able to protect against the races. We could maybe implement a slightly smaller hammer by (1) disabling the cache, (2) invalidate a phandle entry in the cache, (3) re-enable the cache. That is an off the cuff thought - I would have to look a little bit more carefully to make sure it would work. But I don't see a need to add the complexity of the smaller hammer or the bigger hammer of proper locking _unless_ we start seeing that the cache is being freed and re-allocated frequently. For overlays I don't expect the high frequency because it happens on a per overlay removal basis (not per node removal basis). For of_detach_node() the event _is_ on a per node removal basis. Michael, do you expect node removals to be a frequent event with low latency being important? If so, a rough guess on what the frequency would be? -Frank > Rob > >> >> Michael Bringmann writes: >>> See below. >>> >>> On 07/30/2018 01:31 AM, Michael Ellerman wrote: >>>> Michael Bringmann writes: >>>> >>>>> During LPAR migration, the content of the device tree/sysfs may >>>>> be updated including deletion and replacement of nodes in the >>>>> tree. When nodes are added to the internal node structures, they >>>>> are appended in FIFO order to a list of nodes maintained by the >>>>> OF code APIs. >>>> >>>> That hasn't been true for several years. The data structure is an n-ary >>>> tree. What kernel version are you working on? >>> >>> Sorry for an error in my description. I oversimplified based on the >>> name of a search iterator. Let me try to provide a better explanation >>> of the problem, here. >>> >>> This is the problem. The PPC mobility code receives RTAS requests to >>> delete nodes with platform-/hardware-specific attributes when restarting >>> the kernel after a migration. My example is for migration between a >>> P8 Alpine and a P8 Brazos. Nodes to be deleted may include >>> 'ibm,random-v1', >>> 'ibm,compression-v1', 'ibm,platform-facilities', 'ibm,sym-encryption-v1', >>> or others. >>> >>> The mobility.c code calls 'of_detach_node' for the nodes and their children. >>> This makes calls to detach the properties and to try to remove the >>> associated >>> sysfs/kernfs files. >>> >>> Then new copies of the same nodes are next provided by the PHYP, local >>> copies are built, and a pointer to the 'struct device_node' is passed to >>> of_attach_node. Before the call to of_attach_node, the phandle is >>> initialized >>> to 0 when the data structure is alloced. During the call to of_attach_node, >>> it calls __of_attach_node which pulls the actual name and phandle from just >>> created sub-properties named something like 'name' and 'ibm,phandle'. >>> >>> This is all fine for the first migration. The problem occurs with the >>> second and subsequent migrations when the PHYP on the new system wants to >>> replace the same set of nodes again, referenced with the same names and >>> phandle values. >>> >>>> >>>>> When nodes are removed from the device tree, they >>>>> are marked OF_DETACHED, b
Re: [RFC PATCH v2 0/1] of: easier debugging for node life cycle issues
On 01/25/18 15:53, Tyrel Datwyler wrote: > On 01/25/2018 01:49 PM, Frank Rowand wrote: >> Hi Wolfram, >> >> On 01/25/18 03:03, Steven Rostedt wrote: >>> On Wed, 24 Jan 2018 22:55:13 -0800 >>> Frank Rowand wrote: >>> >>>> Hi Steve, >>> >>>> >>>> Off the top of your head, can you tell me know early in the boot >>>> process a trace_event can be called and successfully provide the >>>> data to someone trying to debug early boot issues? >>> >>> The trace events are enabled by early_initcall(). >> >> < snip > >> >> This means that ftrace can not be used for the of_node_get(), >> of_node_put(), and of_node_release() debug info, because >> these functions are called before early_initcall(). Please >> use pr_debug() for these functions. > > I would argue that early boot debugging doesn't completely negate the > usefulness of this tracing infrastructure. I did not say or imply that it did. I am pointing out that this implementation does not meet the needs of other use cases. And potentially provides misleading information (or more precisely misleading lack of information) in some other use cases. > I get that no information > is available in the trace up until ftrace is setup by its > early_initcall, but I still found issues after early boot using this > patch and I would hope that it would be somewhat obvious if > references are out of whack once the ftrace data becomes available. > In the dynamic case on Power we often do reconfig well after boot on > live systems which produces a lot of reference put/gets. This patch > made it easy to identify several reference leaks and underflows in > our attach and detach logic with the added aid of being able to turn > on the stacktrace for each call in the ftrace data. Yes, you can get stacktraces relatively easily. This is the strongest argument for using ftrace. My assumption has been that the stack trace is useful for of_node_get() and of_node_put(). Is there _large_ value to the stack trace for of_reconfig_notify()? > Another thought is it would be nice if we could have the best of both > worlds such that the tracepoints were pr_debugs up until the ftrace > early_initcall. Or, I suppose we could ifdef it and make the ftrace > tracepoints a configuration option, such that if it wasn't configured > we implement the tracepoint functions as pr_debugs. This makes early > boot an option. Just spit balling ideas. An overly complex solution. This is just debug. Worst case alternative is that the patches live on, out of tree. So nope. > > -Tyrel > >> >> As far as I know, the of_reconfig_notify() could remain an >> ftrace instrumented function. But now that the only thing >> that would be ftrace instrumented is of_reconfig_notify(), >> I don't see a strong justification for changing the existing >> pr_debug() calls to an ftrace alternative. Though I suspect >> the original author of the patch still might desire to have >> the "#ifdef DEBUG" surrounding the pr_debug() calls removed >> since one of his issues was having to recompile his kernel >> to do his debugging. >> >> -Frank >> > >
Re: [RFC PATCH v2 0/1] of: easier debugging for node life cycle issues
On 01/25/18 15:14, Wolfram Sang wrote: > >> This means that ftrace can not be used for the of_node_get(), >> of_node_put(), and of_node_release() debug info, because >> these functions are called before early_initcall(). > > For the record: You can still unbind/bind devices. This is how I > debugged an issue. I wasn't implying that the data wasn't usable for any use case. The point is that using ftrace means there are use cases for the debug information where the information will not be available.
Re: [RFC PATCH v2 0/1] of: easier debugging for node life cycle issues
On 01/25/18 15:12, Wolfram Sang wrote: > Frank, > > here seems to be a misunderstanding going on. I don't want to push this > patch upstream against all odds. I merely wanted to find out what the > status of this patch is. Because one possibility was that it had just > been forgotten... > >>> So, I thought reposting would be a good way of finding out if your >>> concerns were addressed in the discussion or not. If I overlooked Marking a patch as RFC (as you mention just below here) is very different than explicitly stating something like: Frank, you had concerns with version 1, were your concerns addressed in the thread about version 1? You mention below that adding RFC to the title was providing the information that I needed. I am saying that the communication was not clear, was implied at best, and should have be more explicit. I actually didn't even notice that the patch title had changed from not an RFC, to being an RFC, so the subtle clue went right over my head. I just treated it as I would any RFC patch. >> Then you should have stated that there were concerns raised in the >> discussion and asked me if my concerns were addressed. > > From my perspective, I have done that: > > I marked the patch as RFC. I put you on the CC list. I asked about the > possibility of applying it. It was not very elaborate, but hey, this is > just a simple debugging patch :) After reading through the original patch thread, I did not think that the issues raised had been fully addressed. You read the thread and thought they had. No big deal on coming to different conclusions. I think you and I are talking past each other a little bit. My comments in the email you are responding to are because I don't think that the previous emails have been as clear as you think. I could read between the lines and see how you think that you were being clear. But from my perspective, I'm asking for more explicit statements and less implied statements. My first real response (the response that pointed out that Rob had made an observation / suggestion that was not responded to) was coming from the perspective that the issues in the first thread had not been fully addressed. As I was writing that response, I felt that I might as well do a review, even though I thought the previous thread was dangling. Which led to a lot of issues and a few more emails pointing them out. > I totally would have accepted a high level "No, that won't fly > because...". Or a high level "This and that would need a change". Or > something like that. I intentionally sent this out as RFC because I know > there is some testing missing. I wanted to know if it is worth taking > further steps with this patch. > > So, I simply wanted to know if you (still) have fundamental issues with > the patch? It would have been good if you had simply stated so in exactly those words. I did not read the original email as saying that. I read the original email as saying (my paraphrase) "please apply it". You did _not_ use the words that I paraphrased, you actually said "So what about applying it?". I misunderstood what you were trying to say. I apologize for that. > That needs to be discussed first before we go into coding > details. I think it is fine to not apply it if there are reasons. I'd > like to know them, however, for a better understanding. Too late now. :-) I've already done the reviewing and provided all of the reasons that are show stoppers for the patch and how to fix. One thing that I did not mention in this thread is that I have an aversion to using ftrace for what is purely debugging data (which this is) because there is a danger that trace points become user space ABI. That is a whole long discussion that we do not have to have because I am not subjecting this patch to that objection. > For me, this is a super-super-side project, so if it causes too much > hazzle, I just leave it here and know interested people can find it > easier now. But if it could be applied with a sane amount of effort, I > was offering that. > > Was that understandable? I think so. And in return? We can always talk more at the next conference if you want. -Frank > > Kind regards, > >Wolfram >
Re: [RFC PATCH v2 1/1] of: introduce event tracepoints for dynamic device_node lifecyle
On 01/25/18 14:40, Tyrel Datwyler wrote: > On 01/24/2018 10:48 PM, Frank Rowand wrote: >> On 01/21/18 06:31, Wolfram Sang wrote: >>> From: Tyrel Datwyler >>> >>> This patch introduces event tracepoints for tracking a device_nodes >>> reference cycle as well as reconfig notifications generated in response >>> to node/property manipulations. >>> >>> With the recent upstreaming of the refcount API several device_node >>> underflows and leaks have come to my attention in the pseries (DLPAR) >>> dynamic logical partitioning code (ie. POWER speak for hotplugging >>> virtual and physcial resources at runtime such as cpus or IOAs). These >>> tracepoints provide a easy and quick mechanism for validating the >>> reference counting of device_nodes during their lifetime. >>> >>> Further, when pseries lpars are migrated to a different machine we >>> perform a live update of our device tree to bring it into alignment with >>> the configuration of the new machine. The of_reconfig_notify trace point >>> provides a mechanism that can be turned for debuging the device tree >>> modifications with out having to build a custom kernel to get at the >>> DEBUG code introduced by commit 00aa37206e1a54 ("of/reconfig: Add debug >>> output for OF_RECONFIG notifiers"). >>> >>> The following trace events are provided: of_node_get, of_node_put, >>> of_node_release, and of_reconfig_notify. These trace points require a >> >> Please add a note that the of_reconfig_notify trace event is not an >> added bit of debug info, but is instead replacing information that >> was previously available via pr_debug() when DEBUG was defined. >> >> >>> kernel built with ftrace support to be enabled. In a typical environment >>> where debugfs is mounted at /sys/kernel/debug the entire set of >>> tracepoints can be set with the following: >>> >>> echo "of:*" > /sys/kernel/debug/tracing/set_event >>> >>> or >>> >>> echo 1 > /sys/kernel/debug/tracing/events/of/enable >>> >>> The following shows the trace point data from a DLPAR remove of a cpu >>> from a pseries lpar: >>> >>> cat /sys/kernel/debug/tracing/trace | grep "POWER8@10" >>> >>> cpuhp/23-147 [023] 128.324827: >>> of_node_put: refcount=5, dn->full_name=/cpus/PowerPC,POWER8@10 >>> cpuhp/23-147 [023] 128.324829: >>> of_node_put: refcount=4, dn->full_name=/cpus/PowerPC,POWER8@10 >>> cpuhp/23-147 [023] 128.324829: >>> of_node_put: refcount=3, dn->full_name=/cpus/PowerPC,POWER8@10 >>> cpuhp/23-147 [023] 128.324831: >>> of_node_put: refcount=2, dn->full_name=/cpus/PowerPC,POWER8@10 >>>drmgr-7284 [009] 128.439000: >>> of_node_put: refcount=1, dn->full_name=/cpus/PowerPC,POWER8@10 >>>drmgr-7284 [009] 128.439002: >>> of_reconfig_notify: action=DETACH_NODE, >>> dn->full_name=/cpus/PowerPC,POWER8@10, >>> prop->name=null, old_prop->name=null >>>drmgr-7284 [009] 128.439015: >>> of_node_put: refcount=0, dn->full_name=/cpus/PowerPC,POWER8@10 >>>drmgr-7284 [009] 128.439016: >>> of_node_release: dn->full_name=/cpus/PowerPC,POWER8@10, dn->_flags=4 >>> >>> Signed-off-by: Tyrel Datwyler >> >> The following belongs in a list of version 2 changes, below the "---" line: >> >>> [wsa: fixed commit abbrev and one of the sysfs paths in commit desc, >>> removed trailing space and fixed pointer declaration in code] >> >>> Signed-off-by: Wolfram Sang >>> --- >>> drivers/of/dynamic.c | 32 ++-- >>> include/trace/events/of.h | 93 >>> +++ >>> 2 files changed, 105 insertions(+), 20 deletions(-) >>> create mode 100644 include/trace/events/of.h >> >> mode looks incorrect. Existing files in include/trace/events/ are -rw-rw >> >> >>> diff --git a/drivers/of/dynamic.c b/drivers/of/dynamic.c >>> index ab988d88704da0..b0d6ab5a35b8c6 100644 >>> --- a/drivers/of/dynamic.c >>> +++ b/drivers/of/dynamic.c >>> @@ -21,6 +21,9 @@ static struct device_node *kobj_to_device_node(struct >>> kobject *kobj) >>> return container_of(kobj, struct device_node, kobj); >>>
Re: [RFC PATCH v2 0/1] of: easier debugging for node life cycle issues
Hi Wolfram, On 01/25/18 03:03, Steven Rostedt wrote: > On Wed, 24 Jan 2018 22:55:13 -0800 > Frank Rowand wrote: > >> Hi Steve, > >> >> Off the top of your head, can you tell me know early in the boot >> process a trace_event can be called and successfully provide the >> data to someone trying to debug early boot issues? > > The trace events are enabled by early_initcall(). < snip > This means that ftrace can not be used for the of_node_get(), of_node_put(), and of_node_release() debug info, because these functions are called before early_initcall(). Please use pr_debug() for these functions. As far as I know, the of_reconfig_notify() could remain an ftrace instrumented function. But now that the only thing that would be ftrace instrumented is of_reconfig_notify(), I don't see a strong justification for changing the existing pr_debug() calls to an ftrace alternative. Though I suspect the original author of the patch still might desire to have the "#ifdef DEBUG" surrounding the pr_debug() calls removed since one of his issues was having to recompile his kernel to do his debugging. -Frank
Re: [RFC PATCH v2 1/1] of: introduce event tracepoints for dynamic device_node lifecyle
On 01/21/18 06:31, Wolfram Sang wrote: > From: Tyrel Datwyler > > This patch introduces event tracepoints for tracking a device_nodes > reference cycle as well as reconfig notifications generated in response > to node/property manipulations. > > With the recent upstreaming of the refcount API several device_node > underflows and leaks have come to my attention in the pseries (DLPAR) > dynamic logical partitioning code (ie. POWER speak for hotplugging > virtual and physcial resources at runtime such as cpus or IOAs). These > tracepoints provide a easy and quick mechanism for validating the > reference counting of device_nodes during their lifetime. > > Further, when pseries lpars are migrated to a different machine we > perform a live update of our device tree to bring it into alignment with > the configuration of the new machine. The of_reconfig_notify trace point > provides a mechanism that can be turned for debuging the device tree > modifications with out having to build a custom kernel to get at the > DEBUG code introduced by commit 00aa37206e1a54 ("of/reconfig: Add debug > output for OF_RECONFIG notifiers"). > > The following trace events are provided: of_node_get, of_node_put, > of_node_release, and of_reconfig_notify. These trace points require a > kernel built with ftrace support to be enabled. In a typical environment > where debugfs is mounted at /sys/kernel/debug the entire set of > tracepoints can be set with the following: > > echo "of:*" > /sys/kernel/debug/tracing/set_event > > or > > echo 1 > /sys/kernel/debug/tracing/events/of/enable > > The following shows the trace point data from a DLPAR remove of a cpu > from a pseries lpar: > > cat /sys/kernel/debug/tracing/trace | grep "POWER8@10" > > cpuhp/23-147 [023] 128.324827: > of_node_put: refcount=5, dn->full_name=/cpus/PowerPC,POWER8@10 > cpuhp/23-147 [023] 128.324829: > of_node_put: refcount=4, dn->full_name=/cpus/PowerPC,POWER8@10 > cpuhp/23-147 [023] 128.324829: > of_node_put: refcount=3, dn->full_name=/cpus/PowerPC,POWER8@10 > cpuhp/23-147 [023] 128.324831: > of_node_put: refcount=2, dn->full_name=/cpus/PowerPC,POWER8@10 >drmgr-7284 [009] 128.439000: > of_node_put: refcount=1, dn->full_name=/cpus/PowerPC,POWER8@10 >drmgr-7284 [009] 128.439002: > of_reconfig_notify: action=DETACH_NODE, > dn->full_name=/cpus/PowerPC,POWER8@10, > prop->name=null, old_prop->name=null >drmgr-7284 [009] 128.439015: > of_node_put: refcount=0, dn->full_name=/cpus/PowerPC,POWER8@10 >drmgr-7284 [009] 128.439016: > of_node_release: dn->full_name=/cpus/PowerPC,POWER8@10, dn->_flags=4 > > Signed-off-by: Tyrel Datwyler > [wsa: fixed commit abbrev and one of the sysfs paths in commit desc, > removed trailing space and fixed pointer declaration in code] > Signed-off-by: Wolfram Sang > --- > drivers/of/dynamic.c | 32 ++-- > include/trace/events/of.h | 93 > +++ > 2 files changed, 105 insertions(+), 20 deletions(-) > create mode 100644 include/trace/events/of.h > > diff --git a/drivers/of/dynamic.c b/drivers/of/dynamic.c > index ab988d88704da0..b0d6ab5a35b8c6 100644 > --- a/drivers/of/dynamic.c > +++ b/drivers/of/dynamic.c > @@ -21,6 +21,9 @@ static struct device_node *kobj_to_device_node(struct > kobject *kobj) > return container_of(kobj, struct device_node, kobj); > } > > +#define CREATE_TRACE_POINTS > +#include > + > /** > * of_node_get() - Increment refcount of a node > * @node:Node to inc refcount, NULL is supported to simplify writing of > @@ -30,8 +33,10 @@ static struct device_node *kobj_to_device_node(struct > kobject *kobj) > */ > struct device_node *of_node_get(struct device_node *node) > { > - if (node) > + if (node) { > kobject_get(&node->kobj); > + trace_of_node_get(refcount_read(&node->kobj.kref.refcount), > node->full_name); > + } > return node; > } > EXPORT_SYMBOL(of_node_get); > @@ -43,8 +48,10 @@ EXPORT_SYMBOL(of_node_get); > */ > void of_node_put(struct device_node *node) > { > - if (node) > + if (node) { > + trace_of_node_put(refcount_read(&node->kobj.kref.refcount) - 1, > node->full_name); > kobject_put(&node->kobj); > + } > } > EXPORT_SYMBOL(of_node_put); > > @@ -75,24 +82,7 @@ const char *action_names[] = { > int of_reconfig_notify(unsigned long action, struct of_reconfig_data *p) > { > int rc; > -#ifdef DEBUG > - struct of_reconfig_data *pr = p; > - > - switch (action) { > - case OF_RECONFIG_ATTACH_NODE: > - case OF_RECONFIG_DETACH_NODE: > - pr_debug("notify %-15s %pOF\n", action_names[action], > - pr->dn); > - break; > - case OF_RECONFIG_ADD_PROPERTY: > - case OF_RECONFIG_REMOVE_PROPERTY: > - ca
Re: [RFC PATCH v2 1/1] of: introduce event tracepoints for dynamic device_node lifecyle
On 01/25/18 00:01, Geert Uytterhoeven wrote: > Hi Frank, > > On Thu, Jan 25, 2018 at 7:48 AM, Frank Rowand wrote: >>> create mode 100644 include/trace/events/of.h >> >> mode looks incorrect. Existing files in include/trace/events/ are -rw-rw > > Not in my git clone ;-) 644 should be fine. > > Gr{oetje,eeting}s, > > Geert Thanks!!! I just learned something new about git. Now I have to go experiment a bit to make sure I learned properly. -Frank
Re: [RFC PATCH v2 1/1] of: introduce event tracepoints for dynamic device_node lifecyle
On 01/24/18 22:48, Frank Rowand wrote: > On 01/21/18 06:31, Wolfram Sang wrote: >> From: Tyrel Datwyler >> >> This patch introduces event tracepoints for tracking a device_nodes >> reference cycle as well as reconfig notifications generated in response >> to node/property manipulations. >> >> With the recent upstreaming of the refcount API several device_node >> underflows and leaks have come to my attention in the pseries (DLPAR) >> dynamic logical partitioning code (ie. POWER speak for hotplugging >> virtual and physcial resources at runtime such as cpus or IOAs). These >> tracepoints provide a easy and quick mechanism for validating the >> reference counting of device_nodes during their lifetime. >> >> Further, when pseries lpars are migrated to a different machine we >> perform a live update of our device tree to bring it into alignment with >> the configuration of the new machine. The of_reconfig_notify trace point >> provides a mechanism that can be turned for debuging the device tree >> modifications with out having to build a custom kernel to get at the >> DEBUG code introduced by commit 00aa37206e1a54 ("of/reconfig: Add debug >> output for OF_RECONFIG notifiers"). >> >> The following trace events are provided: of_node_get, of_node_put, >> of_node_release, and of_reconfig_notify. These trace points require a > > Please add a note that the of_reconfig_notify trace event is not an > added bit of debug info, but is instead replacing information that > was previously available via pr_debug() when DEBUG was defined. I got a little carried away, "when DEBUG was defined" is extra un-needed detail for the commit message. > > >> kernel built with ftrace support to be enabled. In a typical environment >> where debugfs is mounted at /sys/kernel/debug the entire set of >> tracepoints can be set with the following: >> >> echo "of:*" > /sys/kernel/debug/tracing/set_event >> >> or >> >> echo 1 > /sys/kernel/debug/tracing/events/of/enable >> >> The following shows the trace point data from a DLPAR remove of a cpu >> from a pseries lpar: >> >> cat /sys/kernel/debug/tracing/trace | grep "POWER8@10" >> >> cpuhp/23-147 [023] 128.324827: >> of_node_put: refcount=5, dn->full_name=/cpus/PowerPC,POWER8@10 >> cpuhp/23-147 [023] 128.324829: >> of_node_put: refcount=4, dn->full_name=/cpus/PowerPC,POWER8@10 >> cpuhp/23-147 [023] 128.324829: >> of_node_put: refcount=3, dn->full_name=/cpus/PowerPC,POWER8@10 >> cpuhp/23-147 [023] 128.324831: >> of_node_put: refcount=2, dn->full_name=/cpus/PowerPC,POWER8@10 >>drmgr-7284 [009] 128.439000: >> of_node_put: refcount=1, dn->full_name=/cpus/PowerPC,POWER8@10 >>drmgr-7284 [009] 128.439002: >> of_reconfig_notify: action=DETACH_NODE, >> dn->full_name=/cpus/PowerPC,POWER8@10, >> prop->name=null, old_prop->name=null >>drmgr-7284 [009] 128.439015: >> of_node_put: refcount=0, dn->full_name=/cpus/PowerPC,POWER8@10 >>drmgr-7284 [009] 128.439016: >> of_node_release: dn->full_name=/cpus/PowerPC,POWER8@10, dn->_flags=4 >> >> Signed-off-by: Tyrel Datwyler > > The following belongs in a list of version 2 changes, below the "---" line: > >> [wsa: fixed commit abbrev and one of the sysfs paths in commit desc, >> removed trailing space and fixed pointer declaration in code] > >> Signed-off-by: Wolfram Sang >> --- >> drivers/of/dynamic.c | 32 ++-- >> include/trace/events/of.h | 93 >> +++ >> 2 files changed, 105 insertions(+), 20 deletions(-) >> create mode 100644 include/trace/events/of.h > > mode looks incorrect. Existing files in include/trace/events/ are -rw-rw > > >> diff --git a/drivers/of/dynamic.c b/drivers/of/dynamic.c >> index ab988d88704da0..b0d6ab5a35b8c6 100644 >> --- a/drivers/of/dynamic.c >> +++ b/drivers/of/dynamic.c >> @@ -21,6 +21,9 @@ static struct device_node *kobj_to_device_node(struct >> kobject *kobj) >> return container_of(kobj, struct device_node, kobj); >> } >> >> +#define CREATE_TRACE_POINTS >> +#include >> + >> /** >> * of_node_get() - Increment refcount of a node >> * @node: Node to inc refcount, NULL is supported to simplify writing of >> @@ -30,8 +33,10 @@ static struct
Re: [RFC PATCH v2 0/1] of: easier debugging for node life cycle issues
Hi Steve, On 01/21/18 06:31, Wolfram Sang wrote: > I got a bug report for a DT node refcounting problem in the I2C subsystem. > This > patch was a huge help in validating the bug report and the proposed solution. > So, I thought I bring it to attention again. Thanks Tyrel, for the initial > work! > > Note that I did not test the dynamic updates, only of_node_{get|put} so far. I > read that Tyrel checked dynamic updates extensively with this patch. And since > DT overlays are also used within our Renesas dev team, this will help there, > as > well. > > Tested on a Renesas Salvator-XS board (R-Car H3). > > Changes since RFC v1: > * rebased to v4.15-rc8 > * fixed commit abbrev and one of the sysfs paths in commit desc > * removed trailing space and fixed pointer declaration in code > > I consider all the remaining checkpatch issues irrelevant for this patch. > > So what about applying it? > > Kind regards, > >Wolfram > > > Tyrel Datwyler (1): > of: introduce event tracepoints for dynamic device_node lifecyle > > drivers/of/dynamic.c | 32 ++-- > include/trace/events/of.h | 93 > +++ > 2 files changed, 105 insertions(+), 20 deletions(-) > create mode 100644 include/trace/events/of.h > Off the top of your head, can you tell me know early in the boot process a trace_event can be called and successfully provide the data to someone trying to debug early boot issues? Also, way back when version 1 of this patch was being discussed, a question about stacktrace triggers: >>> # echo stacktrace > /sys/kernel/debug/tracing/trace_options >>> # cat trace | grep -A6 "/pci@8002018" >> >> Just to let you know that there is now stacktrace event triggers, where >> you don't need to stacktrace all events, you can pick and choose. And >> even filter the stack trace on specific fields of the event. > > This is great, and I did figure that out this afternoon. One thing I was > still trying to determine though was whether its possible to set these > triggers at boot? As far as I could tell I'm still limited to > "trace_options=stacktrace" as a kernel boot parameter to get the stack > for event tracepoints. No not yet. But I'll add that to the todo list. Thanks, -- Steve Is this still on your todo list, or is it now available? Thanks, Frank
Re: [RFC PATCH v2 1/1] of: introduce event tracepoints for dynamic device_node lifecyle
On 01/21/18 06:31, Wolfram Sang wrote: > From: Tyrel Datwyler > > This patch introduces event tracepoints for tracking a device_nodes > reference cycle as well as reconfig notifications generated in response > to node/property manipulations. > > With the recent upstreaming of the refcount API several device_node > underflows and leaks have come to my attention in the pseries (DLPAR) > dynamic logical partitioning code (ie. POWER speak for hotplugging > virtual and physcial resources at runtime such as cpus or IOAs). These > tracepoints provide a easy and quick mechanism for validating the > reference counting of device_nodes during their lifetime. > > Further, when pseries lpars are migrated to a different machine we > perform a live update of our device tree to bring it into alignment with > the configuration of the new machine. The of_reconfig_notify trace point > provides a mechanism that can be turned for debuging the device tree > modifications with out having to build a custom kernel to get at the > DEBUG code introduced by commit 00aa37206e1a54 ("of/reconfig: Add debug > output for OF_RECONFIG notifiers"). > > The following trace events are provided: of_node_get, of_node_put, > of_node_release, and of_reconfig_notify. These trace points require a Please add a note that the of_reconfig_notify trace event is not an added bit of debug info, but is instead replacing information that was previously available via pr_debug() when DEBUG was defined. > kernel built with ftrace support to be enabled. In a typical environment > where debugfs is mounted at /sys/kernel/debug the entire set of > tracepoints can be set with the following: > > echo "of:*" > /sys/kernel/debug/tracing/set_event > > or > > echo 1 > /sys/kernel/debug/tracing/events/of/enable > > The following shows the trace point data from a DLPAR remove of a cpu > from a pseries lpar: > > cat /sys/kernel/debug/tracing/trace | grep "POWER8@10" > > cpuhp/23-147 [023] 128.324827: > of_node_put: refcount=5, dn->full_name=/cpus/PowerPC,POWER8@10 > cpuhp/23-147 [023] 128.324829: > of_node_put: refcount=4, dn->full_name=/cpus/PowerPC,POWER8@10 > cpuhp/23-147 [023] 128.324829: > of_node_put: refcount=3, dn->full_name=/cpus/PowerPC,POWER8@10 > cpuhp/23-147 [023] 128.324831: > of_node_put: refcount=2, dn->full_name=/cpus/PowerPC,POWER8@10 >drmgr-7284 [009] 128.439000: > of_node_put: refcount=1, dn->full_name=/cpus/PowerPC,POWER8@10 >drmgr-7284 [009] 128.439002: > of_reconfig_notify: action=DETACH_NODE, > dn->full_name=/cpus/PowerPC,POWER8@10, > prop->name=null, old_prop->name=null >drmgr-7284 [009] 128.439015: > of_node_put: refcount=0, dn->full_name=/cpus/PowerPC,POWER8@10 >drmgr-7284 [009] 128.439016: > of_node_release: dn->full_name=/cpus/PowerPC,POWER8@10, dn->_flags=4 > > Signed-off-by: Tyrel Datwyler The following belongs in a list of version 2 changes, below the "---" line: > [wsa: fixed commit abbrev and one of the sysfs paths in commit desc, > removed trailing space and fixed pointer declaration in code] > Signed-off-by: Wolfram Sang > --- > drivers/of/dynamic.c | 32 ++-- > include/trace/events/of.h | 93 > +++ > 2 files changed, 105 insertions(+), 20 deletions(-) > create mode 100644 include/trace/events/of.h mode looks incorrect. Existing files in include/trace/events/ are -rw-rw > diff --git a/drivers/of/dynamic.c b/drivers/of/dynamic.c > index ab988d88704da0..b0d6ab5a35b8c6 100644 > --- a/drivers/of/dynamic.c > +++ b/drivers/of/dynamic.c > @@ -21,6 +21,9 @@ static struct device_node *kobj_to_device_node(struct > kobject *kobj) > return container_of(kobj, struct device_node, kobj); > } > > +#define CREATE_TRACE_POINTS > +#include > + > /** > * of_node_get() - Increment refcount of a node > * @node:Node to inc refcount, NULL is supported to simplify writing of > @@ -30,8 +33,10 @@ static struct device_node *kobj_to_device_node(struct > kobject *kobj) > */ > struct device_node *of_node_get(struct device_node *node) > { > - if (node) > + if (node) { > kobject_get(&node->kobj); > + trace_of_node_get(refcount_read(&node->kobj.kref.refcount), > node->full_name); See the comment from Ron that I mentioned in my previous email. Also, the path has been removed from node->full_name. Does using it here still give all of the information that is desired? Same for all others uses of full_name in this patch. The trace point should have a single argument, node. Accessing the two fields can be done in the tracepoint assignment. Or is there some reason that can't be done? Same for the trace_of_node_put() tracepoint. > + } > return node; > } > EXPORT_SYMBOL(of_node_get); > @@ -43,8 +48,10 @@ EXPORT_SYMBOL(of_node_get);
Re: [RFC PATCH v2 0/1] of: easier debugging for node life cycle issues
On 01/21/18 06:31, Wolfram Sang wrote: > I got a bug report for a DT node refcounting problem in the I2C subsystem. > This > patch was a huge help in validating the bug report and the proposed solution. > So, I thought I bring it to attention again. Thanks Tyrel, for the initial > work! > > Note that I did not test the dynamic updates, only of_node_{get|put} so far. I > read that Tyrel checked dynamic updates extensively with this patch. And since > DT overlays are also used within our Renesas dev team, this will help there, > as > well. It's been nine months since version 1. If you are going to include the dynamic updates part of the patch then please test them. > Tested on a Renesas Salvator-XS board (R-Car H3). > > Changes since RFC v1: > * rebased to v4.15-rc8 > * fixed commit abbrev and one of the sysfs paths in commit desc > * removed trailing space and fixed pointer declaration in code > > I consider all the remaining checkpatch issues irrelevant for this patch. I am OK with the line length warnings in this patch. Why can't the macro error be fixed? A file entry needs to be added to MAINTAINERS. > > So what about applying it? > > Kind regards, > >Wolfram > > > Tyrel Datwyler (1): > of: introduce event tracepoints for dynamic device_node lifecyle > > drivers/of/dynamic.c | 32 ++-- > include/trace/events/of.h | 93 > +++ > 2 files changed, 105 insertions(+), 20 deletions(-) > create mode 100644 include/trace/events/of.h >
Re: [RFC PATCH v2 0/1] of: easier debugging for node life cycle issues
On 01/22/18 03:49, Wolfram Sang wrote: > Hi Frank, > >> Please go back and read the thread for version 1. Simply resubmitting a >> forward port is ignoring that whole conversation. >> >> There is a lot of good info in that thread. I certainly learned stuff in it. > > Yes, I did that and learned stuff, too. My summary of the discussion was: > > - you mentioned some drawbacks you saw (like the mixture of trace output > and printk output)> - most of them look like addressed to me? (e.g. Steven > showed a way to redirect > printk to trace > - you posted your version (which was, however, marked as "not user friendly" > even by yourself) Not exactly a fair quoting. There were two things I said: "Here is a patch that I have used. It is not as user friendly in terms of human readable stack traces (though a very small user space program should be able to fix that)." So easy to fix using existing userspace programs to convert kernel addresses to symbols. "FIXME: Currently using pr_err() so I don't need to set loglevel on boot. So obviously not a user friendly tool!!! The process is: - apply patch - configure, build, boot kernel - analyze data - remove patch" So not friendly because it uses pr_err() instead of pr_debug(). In a reply I said if I submitted my patches I would change it to use pr_debug() instead. So not an issue. And not user friendly because it requires patching the kernel. Again a NOP if I submitted my patch, because the patch would already be in the kernel. But whatever, let's ignore that - a poor quoting is not a reason to reject this version of the patch. > - The discussion stalled over having two approaches Then you should have stated such when you resubmitted. > So, I thought reposting would be a good way of finding out if your > concerns were addressed in the discussion or not. If I overlooked Then you should have stated that there were concerns raised in the discussion and asked me if my concerns were addressed. > something, I am sorry for that. Still, my intention is to continue the > discussion, not to ignore it. Because as it stands, we don't have such a > debugging mechanism in place currently, and with people working with DT > overlays, I'd think it would be nice to have. > > Kind regards, > >Wolfram > Rob suggested: > > @@ -25,8 +28,10 @@ > */ > struct device_node *of_node_get(struct device_node *node) > { > - if (node) > + if (node) { > kobject_get(&node->kobj); > + trace_of_node_get(refcount_read(&node->kobj.kref.refcount), node->full_name); Seems like there should be a kobj wrapper to read the refcount. As far as I noticed, that was never addressed. I don't know the answer, but the question was asked. And if there is no such function, then there is at least kref_read(), which would improve the code a little bit. I'll reply to the patch 0/1 and patch 1/1 emails with review comments. -Frank
Re: [RFC PATCH v2 0/1] of: easier debugging for node life cycle issues
On 01/23/18 04:11, Michael Ellerman wrote: > Wolfram Sang writes: > >> Hi Frank, >> >>> Please go back and read the thread for version 1. Simply resubmitting a >>> forward port is ignoring that whole conversation. >>> >>> There is a lot of good info in that thread. I certainly learned stuff in >>> it. >> >> Yes, I did that and learned stuff, too. My summary of the discussion was: >> >> - you mentioned some drawbacks you saw (like the mixture of trace output >> and printk output) >> - most of them look like addressed to me? (e.g. Steven showed a way to >> redirect >> printk to trace) >> - you posted your version (which was, however, marked as "not user friendly" >> even by yourself) >> - The discussion stalled over having two approaches >> >> So, I thought reposting would be a good way of finding out if your >> concerns were addressed in the discussion or not. If I overlooked >> something, I am sorry for that. Still, my intention is to continue the >> discussion, not to ignore it. Because as it stands, we don't have such a >> debugging mechanism in place currently, and with people working with DT >> overlays, I'd think it would be nice to have. > > Yeah I agree with all of that, I didn't think there were really any > concerns left outstanding. These trace points are very useful, I've > twice added them to a kernel to debug something, so it would be great > for them to be in mainline. > > cheers > Yes, I believe there are concerns outstanding. I'll try to read through the whole thread today to make sure I'm not missing anything. -Frank
Re: [RFC PATCH v2 0/1] of: easier debugging for node life cycle issues
On 01/21/18 06:31, Wolfram Sang wrote: > I got a bug report for a DT node refcounting problem in the I2C subsystem. > This > patch was a huge help in validating the bug report and the proposed solution. > So, I thought I bring it to attention again. Thanks Tyrel, for the initial > work! > > Note that I did not test the dynamic updates, only of_node_{get|put} so far. I > read that Tyrel checked dynamic updates extensively with this patch. And since > DT overlays are also used within our Renesas dev team, this will help there, > as > well. > > Tested on a Renesas Salvator-XS board (R-Car H3). > > Changes since RFC v1: > * rebased to v4.15-rc8 > * fixed commit abbrev and one of the sysfs paths in commit desc > * removed trailing space and fixed pointer declaration in code > > I consider all the remaining checkpatch issues irrelevant for this patch. > > So what about applying it? > > Kind regards, > >Wolfram > > > Tyrel Datwyler (1): > of: introduce event tracepoints for dynamic device_node lifecyle > > drivers/of/dynamic.c | 32 ++-- > include/trace/events/of.h | 93 > +++ > 2 files changed, 105 insertions(+), 20 deletions(-) > create mode 100644 include/trace/events/of.h > Please go back and read the thread for version 1. Simply resubmitting a forward port is ignoring that whole conversation. There is a lot of good info in that thread. I certainly learned stuff in it. Thanks, Frank
Re: [PATCH v2 5/5] of/fdt: only store the device node basename in full_name
On 10/19/17 13:06, Moritz Fischer wrote: < snip > > We also have plenty of code that is just not aware of overlays, and > assumes certain parts of the tree to stay static. I would state that somewhat differently. :-) There is very little code that is aware of overlays, and most code assumes the device tree does not change after early boot. This is one of the areas where the creation of overlays needs to be done with care. > I ran into that issue when I tried to add thermal zones via an overlay, > I've been investigating how to fix the thermal framework to work with > overlays since then and have some partially working code. > Currently the thermal framework parses the thermal-zones node at boot, > and assumes this stays static. This breaks with overlays. > > I agree we eventually need to fix the parts that break when we use > overlays.
Re: [PATCH v2 5/5] of/fdt: only store the device node basename in full_name
On 10/18/17 11:30, Rob Herring wrote: > On Wed, Oct 18, 2017 at 10:53 AM, Pantelis Antoniou > wrote: >> On Wed, 2017-10-18 at 10:44 -0500, Rob Herring wrote: >>> On Wed, Oct 18, 2017 at 10:12 AM, Alan Tull wrote: >>>> On Tue, Oct 17, 2017 at 6:51 PM, Frank Rowand >>>> wrote: >>>>> On 10/17/17 14:46, Rob Herring wrote: >>>>>> On Tue, Oct 17, 2017 at 4:32 PM, Alan Tull wrote: >>>>>>> On Mon, Aug 21, 2017 at 10:16 AM, Rob Herring wrote: >>>>>>> >>>>>>> Hi Rob, >>>>>>> >>>>>>>> With dependencies on a statically allocated full path name converted to >>>>>>>> use %pOF format specifier, we can store just the basename of node, and >>>>>>>> the unflattening of the FDT can be simplified. >>>>>>>> >>>>>>>> This commit will affect the remaining users of full_name. After >>>>>>>> analyzing these users, the remaining cases should only change some >>>>>>>> print >>>>>>>> messages. The main users of full_name are providing a name for struct >>>>>>>> resource. The resource names shouldn't be important other than >>>>>>>> providing >>>>>>>> /proc/iomem names. >>>>>>>> >>>>>>>> We no longer distinguish between pre and post 0x10 dtb formats as >>>>>>>> either >>>>>>>> a full path or basename will work. However, less than 0x10 formats have >>>>>>>> been broken since the conversion to use libfdt (and no one has cared). >>>>>>>> The conversion of the unflattening code to be non-recursive also broke >>>>>>>> pre 0x10 formats as the populate_node function would return 0 in that >>>>>>>> case. >>>>>>>> >>>>>>>> Signed-off-by: Rob Herring >>>>>>>> --- >>>>>>>> v2: >>>>>>>> - rebase to linux-next >>>>>>>> >>>>>>>> drivers/of/fdt.c | 69 >>>>>>>> +--- >>>>>>>> 1 file changed, 11 insertions(+), 58 deletions(-) >>>>>>> >>>>>>> I've just updated to the latest next branch and am finding problems >>>>>>> applying overlays. Reverting this commit alleviates things. The >>>>>>> errors I get are: >>>>>>> >>>>>>> [ 88.498704] OF: overlay: Failed to apply prop @/__symbols__/clk_0 >>>>>>> [ 88.513447] OF: overlay: apply failed '/__symbols__' >>>>>>> [ 88.518423] create_overlay: Failed to create overlay (err=-12) >>>>>> >>>>>> Frank's series with overlay updates should fix this. >>>>> >>>>> Yes, it does: >>>>> >>>>> [PATCH v3 11/12] of: overlay: remove a dependency on device node >>>>> full_name >>>> >>>> Thanks for the fast response. I fetched the dt/next branch to test >>>> this but there are sufficient changes that Pantelis' "OF: DT-Overlay >>>> configfs interface (v7)" is broken now. I've been adding that >>>> downstream since 4.4. We're using it as an interface for applying >>>> overlays to program FPGAs. If we fix it again, is there any chance >>>> that can go upstream now? >>> >>> With a drive-by posting once every few years, no. >>> >> >> I take offense to that. There's nothing changed in the patch for years. >> Reposting the same patch without changes would achieve nothing. > > Are you still expecting review comments on it or something? > Furthermore, If something is posted infrequently, then I'm not > inclined to comment or care if the next posting is going to be after I > forget what I previously said (which is not very long). > > I'm just saying, don't expect to forward port, post and it will be > accepted. Below is minimally one of the issues that needs to be > addressed. > >>> The issue remains that the kernel is not really setup to deal with any >>> random property or node to be changed at any point in run-time. I >>> think there needs to be some restrictions around what the overlays can >>> to
Re: [PATCH v2 5/5] of/fdt: only store the device node basename in full_name
On 10/18/17 11:39, Alan Tull wrote: > On Wed, Oct 18, 2017 at 10:53 AM, Pantelis Antoniou > wrote: >> On Wed, 2017-10-18 at 10:44 -0500, Rob Herring wrote: >>> On Wed, Oct 18, 2017 at 10:12 AM, Alan Tull wrote: >>>> On Tue, Oct 17, 2017 at 6:51 PM, Frank Rowand >>>> wrote: >>>>> On 10/17/17 14:46, Rob Herring wrote: >>>>>> On Tue, Oct 17, 2017 at 4:32 PM, Alan Tull wrote: >>>>>>> On Mon, Aug 21, 2017 at 10:16 AM, Rob Herring wrote: >>>>>>> >>>>>>> Hi Rob, >>>>>>> >>>>>>>> With dependencies on a statically allocated full path name converted to >>>>>>>> use %pOF format specifier, we can store just the basename of node, and >>>>>>>> the unflattening of the FDT can be simplified. >>>>>>>> >>>>>>>> This commit will affect the remaining users of full_name. After >>>>>>>> analyzing these users, the remaining cases should only change some >>>>>>>> print >>>>>>>> messages. The main users of full_name are providing a name for struct >>>>>>>> resource. The resource names shouldn't be important other than >>>>>>>> providing >>>>>>>> /proc/iomem names. >>>>>>>> >>>>>>>> We no longer distinguish between pre and post 0x10 dtb formats as >>>>>>>> either >>>>>>>> a full path or basename will work. However, less than 0x10 formats have >>>>>>>> been broken since the conversion to use libfdt (and no one has cared). >>>>>>>> The conversion of the unflattening code to be non-recursive also broke >>>>>>>> pre 0x10 formats as the populate_node function would return 0 in that >>>>>>>> case. >>>>>>>> >>>>>>>> Signed-off-by: Rob Herring >>>>>>>> --- >>>>>>>> v2: >>>>>>>> - rebase to linux-next >>>>>>>> >>>>>>>> drivers/of/fdt.c | 69 >>>>>>>> +--- >>>>>>>> 1 file changed, 11 insertions(+), 58 deletions(-) >>>>>>> >>>>>>> I've just updated to the latest next branch and am finding problems >>>>>>> applying overlays. Reverting this commit alleviates things. The >>>>>>> errors I get are: >>>>>>> >>>>>>> [ 88.498704] OF: overlay: Failed to apply prop @/__symbols__/clk_0 >>>>>>> [ 88.513447] OF: overlay: apply failed '/__symbols__' >>>>>>> [ 88.518423] create_overlay: Failed to create overlay (err=-12) >>>>>> >>>>>> Frank's series with overlay updates should fix this. >>>>> >>>>> Yes, it does: >>>>> >>>>> [PATCH v3 11/12] of: overlay: remove a dependency on device node >>>>> full_name >>>> >>>> Thanks for the fast response. I fetched the dt/next branch to test >>>> this but there are sufficient changes that Pantelis' "OF: DT-Overlay >>>> configfs interface (v7)" is broken now. I've been adding that >>>> downstream since 4.4. We're using it as an interface for applying >>>> overlays to program FPGAs. If we fix it again, is there any chance >>>> that can go upstream now? >>> >>> With a drive-by posting once every few years, no. >>> >> I take offense to that. There's nothing changed in the patch for years. >> Reposting the same patch without changes would achieve nothing. >> >>> The issue remains that the kernel is not really setup to deal with any >>> random property or node to be changed at any point in run-time. > > Yeah, I'm not super surprised :) I have some whitelist ideas below. > >>> I >>> think there needs to be some restrictions around what the overlays can >>> touch. We can't have it be wide open and then lock things down later >>> and break users. One example of what you could do is you can only add >>> sub-trees to whitelisted nodes. That's probably acceptable for your >>> usecase. > > I can take a look at making OF_OVERLAY_PRE_APPLY and > OF_OVERLAY_PRE_REMOV
Re: [PATCH v2 5/5] of/fdt: only store the device node basename in full_name
On 10/17/17 14:46, Rob Herring wrote: > On Tue, Oct 17, 2017 at 4:32 PM, Alan Tull wrote: >> On Mon, Aug 21, 2017 at 10:16 AM, Rob Herring wrote: >> >> Hi Rob, >> >>> With dependencies on a statically allocated full path name converted to >>> use %pOF format specifier, we can store just the basename of node, and >>> the unflattening of the FDT can be simplified. >>> >>> This commit will affect the remaining users of full_name. After >>> analyzing these users, the remaining cases should only change some print >>> messages. The main users of full_name are providing a name for struct >>> resource. The resource names shouldn't be important other than providing >>> /proc/iomem names. >>> >>> We no longer distinguish between pre and post 0x10 dtb formats as either >>> a full path or basename will work. However, less than 0x10 formats have >>> been broken since the conversion to use libfdt (and no one has cared). >>> The conversion of the unflattening code to be non-recursive also broke >>> pre 0x10 formats as the populate_node function would return 0 in that >>> case. >>> >>> Signed-off-by: Rob Herring >>> --- >>> v2: >>> - rebase to linux-next >>> >>> drivers/of/fdt.c | 69 >>> +--- >>> 1 file changed, 11 insertions(+), 58 deletions(-) >> >> I've just updated to the latest next branch and am finding problems >> applying overlays. Reverting this commit alleviates things. The >> errors I get are: >> >> [ 88.498704] OF: overlay: Failed to apply prop @/__symbols__/clk_0 >> [ 88.513447] OF: overlay: apply failed '/__symbols__' >> [ 88.518423] create_overlay: Failed to create overlay (err=-12) > > Frank's series with overlay updates should fix this. Yes, it does: [PATCH v3 11/12] of: overlay: remove a dependency on device node full_name -Frank
Re: [PATCH v6 1/4] of: remove *phandle properties from expanded device tree
Hi Michael, In Rob's reply to you email, he said: I'd like to move towards dropping 'linux,phandle' including changing dtc to stop generating both properties by default. Perhaps we should just be more explicit that we are doing that. Stop exposing it first and then change how phandles are stored/managed a cycle later. Then we can easily revert it if needed. My current plan is to follow that suggestion. So fixing my patches that started this thread will wait for a later cycle. But I would like to try to address the issues with property "ibm,phandle" that you raised below _before_ fixing my patches. Those fixes should help me understand "ibm,phandle" better, and hopefully when I update my patches I will not cause further "ibm,phandle" issues. More comments from me, embedded below. On 06/20/17 21:57, Michael Ellerman wrote: > Hi Frank, > > frowand.l...@gmail.com writes: >> From: Frank Rowand >> >> Remove "phandle", "linux,phandle", and "ibm,phandle" properties from >> the internal device tree. The phandle will still be in the struct >> device_node phandle field and will still be displayed as if it is >> a property in /proc/device_tree. >> >> This is to resolve the issue found by Stephen Boyd [1] when he changed >> the type of struct property.value from void * to const void *. As >> a result of the type change, the overlay code had compile errors >> where the resolver updates phandle values. >> >> [1] http://lkml.iu.edu/hypermail/linux/kernel/1702.1/04160.html >> >> - Add sysfs infrastructure to report np->phandle, as if it was a property. >> - Do not create "phandle" "ibm,phandle", and "linux,phandle" properties >> in the expanded device tree. >> - Remove phandle properties in of_attach_node(), for nodes dynamically >> attached to the live tree. Add the phandle sysfs entry for these nodes. >> - When creating an overlay changeset, duplicate the node phandle in >> __of_node_dup(). >> - Remove no longer needed checks to exclude "phandle" and "linux,phandle" >> properties in several locations. >> - A side effect of these changes is that the obsolete "linux,phandle" and >> "ibm,phandle" properties will no longer appear in /proc/device-tree (they >> will appear as "phandle"). > > Sorry but I don't think that can work for us. > > Our DLPAR (ie. CPU/memory/device hotplug) stuff on PowerVM uses > "ibm,phandle", and it's not the same thing as "phandle" / > "linux,phandle". > > I don't know the code well myself, but the spec (PAPR) says: > > Note: If the “ibm,phandle” property exists, there are two “phandle” > namespaces which must be kept separate. One is that actually used by > the OF client interface, the other is properties in the device tree > making reference to device tree nodes. These requirements are written > to maintain backward compatibility with older FW versions predating > these requirements; if the “ibm,phandle” property is not present, the > OS may assume that any device tree properties which refer to this node > will have a phandle value matching that returned by client interface > services. > > I have systems here that still use "ibm,phandle". I also see at least > some of the userspace code that looks for "ibm,phandle", and nothing > else. > > The note above actually implies that the current Linux code is wrong, > when it uses "ibm,phandle" as the value of np->phandle. This is where I get lost, and need your help and expertise. If I do the naive search: git grep '"ibm,phandle"' I find it used in three files. (1) drivers/of/fdt.c, which is part of what my patches modified. (2) drivers/misc/cxl/of.c reads (and optionally reports the value of) "ibm,phandle", but does not use the value for anything else. (3) drivers/of/dynamic.c gets and uses the value of "ibm,phandle" only if properties "phandle" and "linux,phandle" do not exist. If either of those two properties exist, their value is used preferentially to the value of "ibm,phandle". So in this case it seems that folding the value of "ibm,phandle" into "phandle" would work properly. Is the "ibm,phandle" property used elsewhere in a way that my naive search missed it? Should it be used in a place that is not using it? > So sorry that's a big mess, but we can't just rip out those properties. You provide some useful references and information above, but I do not understand what I am
Re: [PATCH 0/4] Removing full paths from DT full_name
Hi Rob, On 07/25/17 14:44, Rob Herring wrote: > This series is the last steps to remove storing the full path for every > DT node. Instead, we can create full path strings dynamically as needed > with printf %pOF specifiers (commit ce4fecf1fe15). There are a number of > remaining direct users of full_name after this series. I don't believe > there should be any functional impact for those users with the change to > only the node name (+unit-address). The majority are for struct > resource.name. This should only affect /proc/iomem display. I added a new dependency on full_name in: [PATCH v4 3/3] of: overlay: add overlay symbols to live device tree You don't need to fix that -- I knew the removal of full_name was coming and expected to adapt to that change when it came, so I'll take care of it. It will probably take me two or three weeks to get to it, but that shouldn't be a big deal since the affected code is a new feature that is not yet used. -Frank > > Patches 1 and 2 can be applied now for 4.14. For patches 3 and 4, my > target is 4.15 after all the dependencies have been merged. > > PPC folks, Please test! The PPC parts are untested. A git branch with > all the dependencies is here[1]. > > Rob > > [1] git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux.git dt-printf > > Rob Herring (4): > powerpc: pseries: vio: match parent nodes with of_find_node_by_path > powerpc: pseries: remove dlpar_attach_node dependency on full path > powerpc: pseries: only store the device node basename in full_name > of/fdt: only store the device node basename in full_name > > arch/powerpc/platforms/pseries/dlpar.c | 26 +++ > arch/powerpc/platforms/pseries/hotplug-cpu.c | 2 +- > arch/powerpc/platforms/pseries/mobility.c| 2 +- > arch/powerpc/platforms/pseries/pseries.h | 2 +- > arch/powerpc/platforms/pseries/reconfig.c| 2 +- > arch/powerpc/platforms/pseries/vio.c | 4 +- > drivers/of/fdt.c | 69 > +--- > 7 files changed, 23 insertions(+), 84 deletions(-) >
Re: [PATCH v6 1/4] of: remove *phandle properties from expanded device tree
adding Ben and Paul. Hi Michael, On 06/20/17 21:57, Michael Ellerman wrote: > Hi Frank, > > frowand.l...@gmail.com writes: >> From: Frank Rowand >> >> Remove "phandle", "linux,phandle", and "ibm,phandle" properties from >> the internal device tree. The phandle will still be in the struct >> device_node phandle field and will still be displayed as if it is >> a property in /proc/device_tree. >> >> This is to resolve the issue found by Stephen Boyd [1] when he changed >> the type of struct property.value from void * to const void *. As >> a result of the type change, the overlay code had compile errors >> where the resolver updates phandle values. >> >> [1] http://lkml.iu.edu/hypermail/linux/kernel/1702.1/04160.html >> >> - Add sysfs infrastructure to report np->phandle, as if it was a property. >> - Do not create "phandle" "ibm,phandle", and "linux,phandle" properties >> in the expanded device tree. >> - Remove phandle properties in of_attach_node(), for nodes dynamically >> attached to the live tree. Add the phandle sysfs entry for these nodes. >> - When creating an overlay changeset, duplicate the node phandle in >> __of_node_dup(). >> - Remove no longer needed checks to exclude "phandle" and "linux,phandle" >> properties in several locations. >> - A side effect of these changes is that the obsolete "linux,phandle" and >> "ibm,phandle" properties will no longer appear in /proc/device-tree (they >> will appear as "phandle"). > > Sorry but I don't think that can work for us. > > Our DLPAR (ie. CPU/memory/device hotplug) stuff on PowerVM uses > "ibm,phandle", and it's not the same thing as "phandle" / > "linux,phandle". > > I don't know the code well myself, but the spec (PAPR) says: This is the LoPAPR, section 2.1.4, R1-2.1.4-3 https://members.openpowerfoundation.org/document/dl/469 > Note: If the “ibm,phandle” property exists, there are two “phandle” > namespaces which must be kept separate. One is that actually used by > the OF client interface, the other is properties in the device tree > making reference to device tree nodes. These requirements are written > to maintain backward compatibility with older FW versions predating > these requirements; if the “ibm,phandle” property is not present, the > OS may assume that any device tree properties which refer to this node > will have a phandle value matching that returned by client interface > services. > > I have systems here that still use "ibm,phandle". I also see at least > some of the userspace code that looks for "ibm,phandle", and nothing > else. > > The note above actually implies that the current Linux code is wrong, > when it uses "ibm,phandle" as the value of np->phandle. My interpretation of the LoPAPR R1-2.1.4-1 and R1-2.1.4-2 is that the ibm-phandle property it the node's phandle value that other nodes may refer to. Thus this is the value that should be placed in np->phandle, which is the value that will be used to find a node based on its phandle value. Which is the way the drivers/of/fdt.c currently works: /* We accept flattened tree phandles either in * ePAPR-style "phandle" properties, or the * legacy "linux,phandle" properties. If both * appear and have different values, things * will get weird. Don't do that. */ if (!strcmp(pname, "phandle") || !strcmp(pname, "linux,phandle")) { if (!np->phandle) np->phandle = be32_to_cpup(val); } /* And we process the "ibm,phandle" property * used in pSeries dynamic device tree * stuff */ if (!strcmp(pname, "ibm,phandle")) np->phandle = be32_to_cpup(val); My interpratation of R1-2.1.4-1 through R1-2.1.4-3 is that the "ibm,phandle" property is relevant to the contents of the Linux kernel device tree and that the "phandle returned by a client interface service" is not relevant to the Linux kernel device tree. I would not expect the powerpc code to expose the device tree code to a "phandle returned by a client interface service". Is that correct? The current code which chooses which value potentially ends up in np->phandle seems to involve a little bit of cargo cult coding. This code has been adapted
Re: [PATCH v6 1/4] of: remove *phandle properties from expanded device tree
On 06/20/17 23:18, Frank Rowand wrote: > Hi Rob, > > Michael has an issue that means this patch series is not OK in the > current form. I will work on a v7 to see if I can resolve the > issue. > > -Frank < snip > Hi Rob, The issue is in patch 1. Patches 2 - 4 are small independent patches that are not dependent on patch 1, so I just sent them as individual patches. Version 7 of this series will be just patch 1. -Frank
Re: [PATCH v6 1/4] of: remove *phandle properties from expanded device tree
Hi Rob, Michael has an issue that means this patch series is not OK in the current form. I will work on a v7 to see if I can resolve the issue. -Frank On 06/20/17 21:57, Michael Ellerman wrote: > Hi Frank, > > frowand.l...@gmail.com writes: >> From: Frank Rowand >> >> Remove "phandle", "linux,phandle", and "ibm,phandle" properties from >> the internal device tree. The phandle will still be in the struct >> device_node phandle field and will still be displayed as if it is >> a property in /proc/device_tree. >> >> This is to resolve the issue found by Stephen Boyd [1] when he changed >> the type of struct property.value from void * to const void *. As >> a result of the type change, the overlay code had compile errors >> where the resolver updates phandle values. >> >> [1] http://lkml.iu.edu/hypermail/linux/kernel/1702.1/04160.html >> >> - Add sysfs infrastructure to report np->phandle, as if it was a property. >> - Do not create "phandle" "ibm,phandle", and "linux,phandle" properties >> in the expanded device tree. >> - Remove phandle properties in of_attach_node(), for nodes dynamically >> attached to the live tree. Add the phandle sysfs entry for these nodes. >> - When creating an overlay changeset, duplicate the node phandle in >> __of_node_dup(). >> - Remove no longer needed checks to exclude "phandle" and "linux,phandle" >> properties in several locations. >> - A side effect of these changes is that the obsolete "linux,phandle" and >> "ibm,phandle" properties will no longer appear in /proc/device-tree (they >> will appear as "phandle"). > > Sorry but I don't think that can work for us. > > Our DLPAR (ie. CPU/memory/device hotplug) stuff on PowerVM uses > "ibm,phandle", and it's not the same thing as "phandle" / > "linux,phandle". > > I don't know the code well myself, but the spec (PAPR) says: > > Note: If the “ibm,phandle” property exists, there are two “phandle” > namespaces which must be kept separate. One is that actually used by > the OF client interface, the other is properties in the device tree > making reference to device tree nodes. These requirements are written > to maintain backward compatibility with older FW versions predating > these requirements; if the “ibm,phandle” property is not present, the > OS may assume that any device tree properties which refer to this node > will have a phandle value matching that returned by client interface > services. > > I have systems here that still use "ibm,phandle". I also see at least > some of the userspace code that looks for "ibm,phandle", and nothing > else. > > The note above actually implies that the current Linux code is wrong, > when it uses "ibm,phandle" as the value of np->phandle. > > So sorry that's a big mess, but we can't just rip out those properties. > > I think the minimal change would be to treat "ibm,phandle" like a normal > property, I think that would allow our tools to keep working? > > > The other thing that worries me is that by renaming (effectively) > "linux,phandle" to "phandle", we lose the ability to accurately > regenerate the device tree from /proc/device-tree. In theory it > shouldn't matter, but I worry that in practice something will break. > > What if we just kept a single bit flag somewhere indicating if the name of > the phandle property we found was "phandle" or "linux,phandle", and > create the sysfs phandle using that name? > > cheers >
Re: [PATCH] of: update ePAPR references to point to Devicetree Specification
On 06/18/17 07:05, Rob Herring wrote: > On Tue, Jun 13, 2017 at 07:49:04PM -0700, frowand.l...@gmail.com wrote: >> From: Frank Rowand >> >> The Devicetree Specification has superseded the ePAPR as the >> base specification for bindings. Update files in Documentation >> to reference the new document. >> >> Some files are not updated because there is no hypervisor chapter >> in the Devicetree Specification: >>Documentation/devicetree/bindings/powerpc/fsl/msi-pic.txt >>Documenation/virtual/kvm/api.txt >>Documenation/virtual/kvm/ppc-pv.txt >> >> Signed-off-by: Frank Rowand >> --- >> Documentation/devicetree/bindings/arm/cci.txt | 12 >> ++-- >> Documentation/devicetree/bindings/arm/cpus.txt | 13 >> +++-- >> Documentation/devicetree/bindings/arm/idle-states.txt | 4 ++-- >> Documentation/devicetree/bindings/arm/l2c2x0.txt| 4 ++-- >> Documentation/devicetree/bindings/arm/topology.txt | 4 ++-- >> Documentation/devicetree/bindings/bus/simple-pm-bus.txt | 2 +- >> Documentation/devicetree/bindings/chosen.txt| 3 ++- >> Documentation/devicetree/bindings/common-properties.txt | 2 +- >> Documentation/devicetree/bindings/crypto/fsl-sec4.txt | 4 ++-- >> Documentation/devicetree/bindings/crypto/fsl-sec6.txt | 4 ++-- >> .../devicetree/bindings/interrupt-controller/open-pic.txt | 5 ++--- >> Documentation/devicetree/bindings/net/ethernet.txt | 9 ++--- >> Documentation/devicetree/bindings/powerpc/fsl/cpus.txt | 6 +++--- >> Documentation/devicetree/bindings/powerpc/fsl/l2cache.txt | 2 +- >> Documentation/devicetree/bindings/powerpc/fsl/srio-rmu.txt | 4 ++-- >> Documentation/devicetree/bindings/powerpc/fsl/srio.txt | 3 ++- >> Documentation/devicetree/booting-without-of.txt | 2 +- >> Documentation/devicetree/usage-model.txt| 2 +- >> Documentation/xtensa/mmu.txt| 6 +++--- >> 19 files changed, 48 insertions(+), 43 deletions(-) >> >> diff --git a/Documentation/devicetree/bindings/arm/cci.txt >> b/Documentation/devicetree/bindings/arm/cci.txt >> index 0f2153e8fa7e..cc7621b204f4 100644 >> --- a/Documentation/devicetree/bindings/arm/cci.txt >> +++ b/Documentation/devicetree/bindings/arm/cci.txt >> @@ -11,9 +11,9 @@ clusters, through memory mapped interface, with a global >> control register >> space and multiple sets of interface control registers, one per slave >> interface. >> >> -Bindings for the CCI node follow the ePAPR standard, available from: >> +Bindings for the CCI node follow the Devicetree Specification, available >> from: >> >> -www.power.org/documentation/epapr-version-1-1/ >> +https://www.devicetree.org/specifications/ > > Actually, I find this meaningless other than in the sense that *every* > binding follows the spec. This looks like cut-n-paste from the ARM cpus > binding which did follow the spec for cpus to some extent. Good point. I can remove the reference to the ePAPR from this file. >> >> with the addition of the bindings described in this document which are >> specific to ARM. >> @@ -50,10 +50,10 @@ specific to ARM. >> as a tuple of cells, containing child address, >> parent address and the size of the region in the >> child address space. >> -Definition: A standard property. Follow rules in the ePAPR for >> -hierarchical bus addressing. CCI interfaces >> -addresses refer to the parent node addressing >> -scheme to declare their register bases. >> +Definition: A standard property. Follow rules in the Devicetree >> +Specification for hierarchical bus addressing. CCI >> +interfaces addresses refer to the parent node >> +addressing scheme to declare their register bases. >> >> CCI interconnect node can define the following child nodes: >> >> diff --git a/Documentation/devicetree/bindings/arm/cpus.txt >> b/Documentation/devicetree/bindings/arm/cpus.txt >> index 1030f5f50207..283c520a2224 100644 >> --- a/Documentation/devicetree/bindings/arm/cpus.txt >> +++ b/Documentation/devicetree/bindings/arm/cpus.txt >> @@ -6,9 +6,9 @@ The device tree allows to describe the layout of CPUs in a >> system through >> the "cpus&quo
Re: [PATCH] of: introduce event tracepoints for dynamic device_node lifecyle
On 04/20/17 09:51, Tyrel Datwyler wrote: > On 04/19/2017 09:43 PM, Frank Rowand wrote: > < snip > >> The call stack could easily be post-processed, for example using addr2line. >> Here is the call stack for when the refcount incremented to 23 from 22 (or >> more accurately, to 22 from 21): >> >> 0xc0d00e3c Line 857 of "init/main.c" >> 0xc03017d0 Line 792 of "init/main.c" >> 0xc0d3a234 Line 528 of "drivers/of/platform.c" >> 0xc0810684 Line 503 of "drivers/of/platform.c" >> 0xc081061c Line 267 of "include/linux/of.h" >> 0xc080d928 Line 815 of "drivers/of/base.c" >> >> Which ends up being this code: >> >>of_platform_default_populate_init() >> of_platform_default_populate() >> of_platform_populate() >> [[ of_find_node_by_path("/") ]] >>[[ of_find_node_opts_by_path(path, NULL) ]] >> of_node_get(of_root) >> >> Note that some functions can be left out of the ARM call stack, with >> a return going back more than one level. The functions in the call >> list above that are enclosed in '[[' and ']]' were found by source >> inspection in those cases. > > The same thing is encountered in ppc64 stack traces. I assume it is > generally inlining of small functions, but I've never actually verified > that theory. Probably should take the time to investigate, or just ask > someone. Yes, inlining small functions is one reason for this. Another case I often find is that when function A calls function B calls function C. If the final statement of function B is 'return C()' then there is no need for function C to return through function B, it can instead return directly to function A. (That is more a conceptual hand-waving of the idea, not the actual way the compiler implements it. Take this with a grain of salt, I'm not a compiler guy.) < snip >