[RFC PATCH v1] powerpc/prom_init: disable XIVE in Secure VM.
XIVE is not correctly enabled for Secure VM in the KVM Hypervisor yet. Hence Secure VM, must always default to XICS interrupt controller. If XIVE is requested through kernel command line option "xive=on", override and turn it off. If XIVE is the only supported platform interrupt controller; specified through qemu option "ic-mode=xive", simply abort. Otherwise default to XICS. Cc: kvm-...@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Cc: Michael Ellerman Cc: Thiago Jung Bauermann Cc: Michael Anderson Cc: Sukadev Bhattiprolu Cc: Alexey Kardashevskiy Cc: Paul Mackerras Cc: Greg Kurz Cc: Cedric Le Goater Cc: David Gibson Signed-off-by: Ram Pai --- arch/powerpc/kernel/prom_init.c | 43 - 1 file changed, 30 insertions(+), 13 deletions(-) diff --git a/arch/powerpc/kernel/prom_init.c b/arch/powerpc/kernel/prom_init.c index 5773453..dd96c82 100644 --- a/arch/powerpc/kernel/prom_init.c +++ b/arch/powerpc/kernel/prom_init.c @@ -805,6 +805,18 @@ static void __init early_cmdline_parse(void) #endif } +#ifdef CONFIG_PPC_SVM + opt = prom_strstr(prom_cmd_line, "svm="); + if (opt) { + bool val; + + opt += sizeof("svm=") - 1; + if (!prom_strtobool(opt, )) + prom_svm_enable = val; + prom_printf("svm =%d\n", prom_svm_enable); + } +#endif /* CONFIG_PPC_SVM */ + #ifdef CONFIG_PPC_PSERIES prom_radix_disable = !IS_ENABLED(CONFIG_PPC_RADIX_MMU_DEFAULT); opt = prom_strstr(prom_cmd_line, "disable_radix"); @@ -823,23 +835,22 @@ static void __init early_cmdline_parse(void) if (prom_radix_disable) prom_debug("Radix disabled from cmdline\n"); - opt = prom_strstr(prom_cmd_line, "xive=off"); - if (opt) { +#ifdef CONFIG_PPC_SVM + if (prom_svm_enable) { prom_xive_disable = true; - prom_debug("XIVE disabled from cmdline\n"); + prom_debug("XIVE disabled in Secure VM\n"); } -#endif /* CONFIG_PPC_PSERIES */ - -#ifdef CONFIG_PPC_SVM - opt = prom_strstr(prom_cmd_line, "svm="); - if (opt) { - bool val; +#endif /* CONFIG_PPC_SVM */ - opt += sizeof("svm=") - 1; - if (!prom_strtobool(opt, )) - prom_svm_enable = val; + if (!prom_xive_disable) { + opt = prom_strstr(prom_cmd_line, "xive=off"); + if (opt) { + prom_xive_disable = true; + prom_debug("XIVE disabled from cmdline\n"); + } } -#endif /* CONFIG_PPC_SVM */ + +#endif /* CONFIG_PPC_PSERIES */ } #ifdef CONFIG_PPC_PSERIES @@ -1251,6 +1262,12 @@ static void __init prom_parse_xive_model(u8 val, break; case OV5_FEAT(OV5_XIVE_EXPLOIT): /* Only Exploitation mode */ prom_debug("XIVE - exploitation mode supported\n"); + +#ifdef CONFIG_PPC_SVM + if (prom_svm_enable) + prom_panic("WARNING: xive unsupported in Secure VM\n"); +#endif /* CONFIG_PPC_SVM */ + if (prom_xive_disable) { /* * If we __have__ to do XIVE, we're better off ignoring -- 1.8.3.1
Re: [PATCH v3 0/6] implement KASLR for powerpc/fsl_booke/64
在 2020/2/29 12:28, Scott Wood 写道: On Fri, 2020-02-28 at 14:47 +0800, Jason Yan wrote: 在 2020/2/28 13:53, Scott Wood 写道: On Wed, 2020-02-26 at 16:18 +0800, Jason Yan wrote: Hi Daniel, 在 2020/2/26 15:16, Daniel Axtens 写道: Maybe replacing the REG format string in KASLR mode would be sufficient? Most archs have removed the address printing when dumping stack. Do we really have to print this? If we have to do this, maybe we can use "%pK" so that they will be hidden from unprivileged users. I've found the addresses to be useful, especially if I had a way to dump the stack data itself. Wouldn't the register dump also be likely to give away the addresses? If we have to print the address, then kptr_restrict and dmesg_restrict must be set properly so that unprivileged users cannot see them. And how does that work with crash dumps that could be from any context? dmesg_restrict is irrelevant as it just controls who can see the dmesg, not what goes into it. kptr_restrict=1 will only get the value if you're not in any sort of IRQ, *and* if the crashing context happened to have CAP_SYSLOG. No other value of kptr_restrict will ever get you the raw value. I don't see any debug setting for %pK (or %p) to always print the actual address (closest is kptr_restrict=1 but that only works in certain contexts)... from looking at the code it seems it hashes even if kaslr is entirely disabled? Or am I missing something? Yes, %pK (or %p) always hashes whether kaslr is disabled or not. So if we want the real value of the address, we cannot use it. But if you only want to distinguish if two pointers are the same, it's ok. Am I the only one that finds this a bit crazy? If you want to lock a system down then fine, but why wage war on debugging even when there's no randomization going on? Comparing two pointers for equality is not always adequate. AFAIK, %p hashing is only exist because of many legacy address printings and force who really want the raw values to switch to %px or even %lx. It's not the opposite of debugging. Raw address printing is not forbidden, only people need to estimate the risk of adrdress leaks. Turnning to %p may not be a good idea in this situation. So for the REG logs printed when dumping stack, we can disable it when KASLR is open. For the REG logs in other places like show_regs(), only privileged can trigger it, and they are not combind with a symbol, so I think it's ok to keep them. diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c index fad50db9dcf2..659c51f0739a 100644 --- a/arch/powerpc/kernel/process.c +++ b/arch/powerpc/kernel/process.c @@ -2068,7 +2068,10 @@ void show_stack(struct task_struct *tsk, unsigned long *stack) newsp = stack[0]; ip = stack[STACK_FRAME_LR_SAVE]; if (!firstframe || ip != lr) { - printk("["REG"] ["REG"] %pS", sp, ip, (void *)ip); + if (IS_ENABLED(CONFIG_RANDOMIZE_BASE)) + printk("%pS", (void *)ip); + else + printk("["REG"] ["REG"] %pS", sp, ip, (void *)ip); #ifdef CONFIG_FUNCTION_GRAPH_TRACER ret_addr = ftrace_graph_ret_addr(current, _idx, ip, stack); Thanks, Jason -Scott .
Re: [PATCH v3 0/6] implement KASLR for powerpc/fsl_booke/64
On Fri, 2020-02-28 at 14:47 +0800, Jason Yan wrote: > > 在 2020/2/28 13:53, Scott Wood 写道: > > On Wed, 2020-02-26 at 16:18 +0800, Jason Yan wrote: > > > Hi Daniel, > > > > > > 在 2020/2/26 15:16, Daniel Axtens 写道: > > > > Maybe replacing the REG format string in KASLR mode would be > > > > sufficient? > > > > > > > > > > Most archs have removed the address printing when dumping stack. Do we > > > really have to print this? > > > > > > If we have to do this, maybe we can use "%pK" so that they will be > > > hidden from unprivileged users. > > > > I've found the addresses to be useful, especially if I had a way to dump > > the > > stack data itself. Wouldn't the register dump also be likely to give away > > the > > addresses? > > If we have to print the address, then kptr_restrict and dmesg_restrict > must be set properly so that unprivileged users cannot see them. And how does that work with crash dumps that could be from any context? dmesg_restrict is irrelevant as it just controls who can see the dmesg, not what goes into it. kptr_restrict=1 will only get the value if you're not in any sort of IRQ, *and* if the crashing context happened to have CAP_SYSLOG. No other value of kptr_restrict will ever get you the raw value. > > > > I don't see any debug setting for %pK (or %p) to always print the actual > > address (closest is kptr_restrict=1 but that only works in certain > > contexts)... from looking at the code it seems it hashes even if kaslr is > > entirely disabled? Or am I missing something? > > > > Yes, %pK (or %p) always hashes whether kaslr is disabled or not. So if > we want the real value of the address, we cannot use it. But if you only > want to distinguish if two pointers are the same, it's ok. Am I the only one that finds this a bit crazy? If you want to lock a system down then fine, but why wage war on debugging even when there's no randomization going on? Comparing two pointers for equality is not always adequate. -Scott
Re: [PATCH] libnvdimm/bus: return the outvar 'cmd_rc' error code in __nd_ioctl()
On Tue, Feb 18, 2020 at 1:03 PM Dan Williams wrote: > > On Tue, Feb 18, 2020 at 1:00 PM Jeff Moyer wrote: > > > > Vaibhav Jain writes: > > > > > Presently the error code returned via out variable 'cmd_rc' from the > > > nvdimm-bus controller function is ignored when called from > > > __nd_ioctl() and never communicated back to user-space code that called > > > an ioctl on dimm/bus. > > > > > > This minor patch updates __nd_ioctl() to propagate the value of out > > > variable 'cmd_rc' back to user-space in case it reports an error. > > > > > > Signed-off-by: Vaibhav Jain > > > --- > > > drivers/nvdimm/bus.c | 5 + > > > 1 file changed, 5 insertions(+) > > > > > > diff --git a/drivers/nvdimm/bus.c b/drivers/nvdimm/bus.c > > > index a8b515968569..5b687a27fdf2 100644 > > > --- a/drivers/nvdimm/bus.c > > > +++ b/drivers/nvdimm/bus.c > > > @@ -1153,6 +1153,11 @@ static int __nd_ioctl(struct nvdimm_bus > > > *nvdimm_bus, struct nvdimm *nvdimm, > > > if (rc < 0) > > > goto out_unlock; > > > > > > + if (cmd_rc < 0) { > > > + rc = cmd_rc; > > > + goto out_unlock; > > > + } > > > + > > > if (!nvdimm && cmd == ND_CMD_CLEAR_ERROR && cmd_rc >= 0) { > > > struct nd_cmd_clear_error *clear_err = buf; > > > > Looks good to me. > > > > Reviewed-by: Jeff Moyer > > Applied. Unapplied. This breaks the NVDIMM unit test, and now that I look closer you are likely overlooking the fact that cmd_rc is a translation of the firmware status, while the ioctl rc is whether the command was successfully submitted. If you want the equivalent of cmd_rc in userspace you need to translate the firmware status. See ndctl_cmd_submit_xlat() in libndctl as an example of how the equivalent of cmd_rc is generated from the firmware status.
Re: [PATCH v3 1/7] mm/memory_hotplug: Drop the flags field from struct mhp_restrictions
On Fri, Feb 21, 2020 at 10:25 AM Logan Gunthorpe wrote: > > This variable is not used anywhere and should therefore be removed > from the structure. > > Signed-off-by: Logan Gunthorpe > Reviewed-by: David Hildenbrand Reviewed-by: Dan Williams
Re: [PATCH] selftests: powerpc: Add tlbie_test in .gitignore
Hi [This is an automated email] This commit has been processed because it contains a "Fixes:" tag fixing commit: 93cad5f78995 ("selftests/powerpc: Add test case for tlbie vs mtpidr ordering issue"). The bot has tested the following trees: v5.5.6, v5.4.22, v4.19.106, v4.14.171. v5.5.6: Failed to apply! Possible dependencies: 5eb7cfb3a2b1 ("selftests/powerpc: Add a test of bad (out-of-range) accesses") v5.4.22: Failed to apply! Possible dependencies: 5eb7cfb3a2b1 ("selftests/powerpc: Add a test of bad (out-of-range) accesses") v4.19.106: Failed to apply! Possible dependencies: 16391bfc8623 ("selftests/powerpc: Add test of fork with mapping above 512TB") 5eb7cfb3a2b1 ("selftests/powerpc: Add a test of bad (out-of-range) accesses") 7b570361f6f6 ("selftests/powerpc: Add missing newline at end of file") b7683fc66eba ("selftests/powerpc: Add a test of wild bctr") v4.14.171: Failed to apply! Possible dependencies: 16391bfc8623 ("selftests/powerpc: Add test of fork with mapping above 512TB") 5eb7cfb3a2b1 ("selftests/powerpc: Add a test of bad (out-of-range) accesses") 6ed361586b32 ("selftests/powerpc: Add a test of SEGV error behaviour") 7b570361f6f6 ("selftests/powerpc: Add missing newline at end of file") b7683fc66eba ("selftests/powerpc: Add a test of wild bctr") NOTE: The patch will not be queued to stable trees until it is upstream. How should we proceed with this patch? -- Thanks Sasha
[PATCH v3 5/5] libnvdimm/region: Introduce an 'align' attribute
The align attribute applies an alignment constraint for namespace creation in a region. Whereas the 'align' attribute of a namespace applied alignment padding via an info block, the 'align' attribute applies alignment constraints to the free space allocation. The default for 'align' is the maximum known memremap_compat_align() across all archs (16MiB from PowerPC at time of writing) multiplied by the number of interleave ways if there is blk-aliasing. The minimum is PAGE_SIZE and allows for the creation of cross-arch incompatible namespaces, just as previous kernels allowed, but the expectation is cross-arch and mode-independent compatibility by default. The regression risk with this change is limited to cases that were dependent on the ability to create unaligned namespaces, *and* for some reason are unable to opt-out of aligned namespaces by writing to 'regionX/align'. If such a scenario arises the default can be flipped from opt-out to opt-in of compat-aligned namespace creation, but that is a last resort. The kernel will otherwise continue to support existing defined misaligned namespaces. Unfortunately this change needs to touch several parts of the implementation at once: - region/available_size: expand busy extents to current align - region/max_available_extent: expand busy extents to current align - namespace/size: trim free space to current align ...to keep the free space accounting conforming to the dynamic align setting. Reported-by: Aneesh Kumar K.V Reported-by: Jeff Moyer Reviewed-by: Aneesh Kumar K.V Reviewed-by: Jeff Moyer Link: https://lore.kernel.org/r/158041478371.3889308.14542630147672668068.st...@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams --- drivers/nvdimm/dimm_devs.c | 86 +++ drivers/nvdimm/namespace_devs.c |9 ++- drivers/nvdimm/nd.h |1 drivers/nvdimm/region_devs.c| 122 --- 4 files changed, 192 insertions(+), 26 deletions(-) diff --git a/drivers/nvdimm/dimm_devs.c b/drivers/nvdimm/dimm_devs.c index 39a61a514746..b7b77e8d9027 100644 --- a/drivers/nvdimm/dimm_devs.c +++ b/drivers/nvdimm/dimm_devs.c @@ -563,6 +563,21 @@ int nvdimm_security_freeze(struct nvdimm *nvdimm) return rc; } +static unsigned long dpa_align(struct nd_region *nd_region) +{ + struct device *dev = _region->dev; + + if (dev_WARN_ONCE(dev, !is_nvdimm_bus_locked(dev), + "bus lock required for capacity provision\n")) + return 0; + if (dev_WARN_ONCE(dev, !nd_region->ndr_mappings || nd_region->align + % nd_region->ndr_mappings, + "invalid region align %#lx mappings: %d\n", + nd_region->align, nd_region->ndr_mappings)) + return 0; + return nd_region->align / nd_region->ndr_mappings; +} + int alias_dpa_busy(struct device *dev, void *data) { resource_size_t map_end, blk_start, new; @@ -571,6 +586,7 @@ int alias_dpa_busy(struct device *dev, void *data) struct nd_region *nd_region; struct nvdimm_drvdata *ndd; struct resource *res; + unsigned long align; int i; if (!is_memory(dev)) @@ -608,13 +624,21 @@ int alias_dpa_busy(struct device *dev, void *data) * Find the free dpa from the end of the last pmem allocation to * the end of the interleave-set mapping. */ + align = dpa_align(nd_region); + if (!align) + return 0; + for_each_dpa_resource(ndd, res) { + resource_size_t start, end; + if (strncmp(res->name, "pmem", 4) != 0) continue; - if ((res->start >= blk_start && res->start < map_end) - || (res->end >= blk_start - && res->end <= map_end)) { - new = max(blk_start, min(map_end + 1, res->end + 1)); + + start = ALIGN_DOWN(res->start, align); + end = ALIGN(res->end + 1, align) - 1; + if ((start >= blk_start && start < map_end) + || (end >= blk_start && end <= map_end)) { + new = max(blk_start, min(map_end, end) + 1); if (new != blk_start) { blk_start = new; goto retry; @@ -654,6 +678,7 @@ resource_size_t nd_blk_available_dpa(struct nd_region *nd_region) .res = NULL, }; struct resource *res; + unsigned long align; if (!ndd) return 0; @@ -661,10 +686,20 @@ resource_size_t nd_blk_available_dpa(struct nd_region *nd_region) device_for_each_child(_bus->dev, , alias_dpa_busy); /* now account for busy blk allocations in unaliased dpa */ + align = dpa_align(nd_region); + if (!align) +
[PATCH v3 4/5] libnvdimm/region: Introduce NDD_LABELING
The NDD_ALIASING flag is used to indicate where pmem capacity might alias with blk capacity and require labeling. It is also used to indicate whether the DIMM supports labeling. Separate this latter capability into its own flag so that the NDD_ALIASING flag is scoped to true aliased configurations. To my knowledge aliased configurations only exist in the ACPI spec, there are no known platforms that ship this support in production. This clarity allows namespace-capacity alignment constraints around interleave-ways to be relaxed. Cc: Vishal Verma Cc: Oliver O'Halloran Reviewed-by: Jeff Moyer Reviewed-by: Aneesh Kumar K.V Link: https://lore.kernel.org/r/158041477856.3889308.4212605617834097674.st...@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams --- arch/powerpc/platforms/pseries/papr_scm.c |2 +- drivers/acpi/nfit/core.c |4 +++- drivers/nvdimm/dimm.c |2 +- drivers/nvdimm/dimm_devs.c|9 + drivers/nvdimm/namespace_devs.c |2 +- drivers/nvdimm/nd.h |2 +- drivers/nvdimm/region_devs.c | 10 +- include/linux/libnvdimm.h |2 ++ 8 files changed, 19 insertions(+), 14 deletions(-) diff --git a/arch/powerpc/platforms/pseries/papr_scm.c b/arch/powerpc/platforms/pseries/papr_scm.c index 0b4467e378e5..589858cb3203 100644 --- a/arch/powerpc/platforms/pseries/papr_scm.c +++ b/arch/powerpc/platforms/pseries/papr_scm.c @@ -328,7 +328,7 @@ static int papr_scm_nvdimm_init(struct papr_scm_priv *p) } dimm_flags = 0; - set_bit(NDD_ALIASING, _flags); + set_bit(NDD_LABELING, _flags); p->nvdimm = nvdimm_create(p->bus, p, NULL, dimm_flags, PAPR_SCM_DIMM_CMD_MASK, 0, NULL); diff --git a/drivers/acpi/nfit/core.c b/drivers/acpi/nfit/core.c index a3320f93616d..71d7f2aa1b12 100644 --- a/drivers/acpi/nfit/core.c +++ b/drivers/acpi/nfit/core.c @@ -2026,8 +2026,10 @@ static int acpi_nfit_register_dimms(struct acpi_nfit_desc *acpi_desc) continue; } - if (nfit_mem->bdw && nfit_mem->memdev_pmem) + if (nfit_mem->bdw && nfit_mem->memdev_pmem) { set_bit(NDD_ALIASING, ); + set_bit(NDD_LABELING, ); + } /* collate flags across all memdevs for this dimm */ list_for_each_entry(nfit_memdev, _desc->memdevs, list) { diff --git a/drivers/nvdimm/dimm.c b/drivers/nvdimm/dimm.c index 64776ed15bb3..7d4ddc4d9322 100644 --- a/drivers/nvdimm/dimm.c +++ b/drivers/nvdimm/dimm.c @@ -99,7 +99,7 @@ static int nvdimm_probe(struct device *dev) if (ndd->ns_current >= 0) { rc = nd_label_reserve_dpa(ndd); if (rc == 0) - nvdimm_set_aliasing(dev); + nvdimm_set_labeling(dev); } nvdimm_bus_unlock(dev); diff --git a/drivers/nvdimm/dimm_devs.c b/drivers/nvdimm/dimm_devs.c index 94ea6dba6b4f..39a61a514746 100644 --- a/drivers/nvdimm/dimm_devs.c +++ b/drivers/nvdimm/dimm_devs.c @@ -32,7 +32,7 @@ int nvdimm_check_config_data(struct device *dev) if (!nvdimm->cmd_mask || !test_bit(ND_CMD_GET_CONFIG_DATA, >cmd_mask)) { - if (test_bit(NDD_ALIASING, >flags)) + if (test_bit(NDD_LABELING, >flags)) return -ENXIO; else return -ENOTTY; @@ -173,11 +173,11 @@ int nvdimm_set_config_data(struct nvdimm_drvdata *ndd, size_t offset, return rc; } -void nvdimm_set_aliasing(struct device *dev) +void nvdimm_set_labeling(struct device *dev) { struct nvdimm *nvdimm = to_nvdimm(dev); - set_bit(NDD_ALIASING, >flags); + set_bit(NDD_LABELING, >flags); } void nvdimm_set_locked(struct device *dev) @@ -312,8 +312,9 @@ static ssize_t flags_show(struct device *dev, { struct nvdimm *nvdimm = to_nvdimm(dev); - return sprintf(buf, "%s%s\n", + return sprintf(buf, "%s%s%s\n", test_bit(NDD_ALIASING, >flags) ? "alias " : "", + test_bit(NDD_LABELING, >flags) ? "label " : "", test_bit(NDD_LOCKED, >flags) ? "lock " : ""); } static DEVICE_ATTR_RO(flags); diff --git a/drivers/nvdimm/namespace_devs.c b/drivers/nvdimm/namespace_devs.c index 68e89855f779..2388598ce1a2 100644 --- a/drivers/nvdimm/namespace_devs.c +++ b/drivers/nvdimm/namespace_devs.c @@ -2533,7 +2533,7 @@ static int init_active_labels(struct nd_region *nd_region) if (!ndd) { if (test_bit(NDD_LOCKED, >flags)) /* fail, label data may be unreadable */; - else if (test_bit(NDD_ALIASING, >flags)) + else if (test_bit(NDD_LABELING, >flags)) /* fail, labels
[PATCH v3 3/5] libnvdimm/namespace: Enforce memremap_compat_align()
The pmem driver on PowerPC crashes with the following signature when instantiating misaligned namespaces that map their capacity via memremap_pages(). BUG: Unable to handle kernel data access at 0xc00100040600 Faulting instruction address: 0xc0090790 NIP [c0090790] arch_add_memory+0xc0/0x130 LR [c0090744] arch_add_memory+0x74/0x130 Call Trace: arch_add_memory+0x74/0x130 (unreliable) memremap_pages+0x74c/0xa30 devm_memremap_pages+0x3c/0xa0 pmem_attach_disk+0x188/0x770 nvdimm_bus_probe+0xd8/0x470 With the assumption that only memremap_pages() has alignment constraints, enforce memremap_compat_align() for pmem_should_map_pages(), nd_pfn, and nd_dax cases. This includes preventing the creation of namespaces where the base address is misaligned and cases there infoblock padding parameters are invalid. Reported-by: Aneesh Kumar K.V Cc: Jeff Moyer Fixes: a3619190d62e ("libnvdimm/pfn: stop padding pmem namespaces to section alignment") Signed-off-by: Dan Williams --- drivers/nvdimm/namespace_devs.c | 12 drivers/nvdimm/pfn_devs.c | 26 +++--- 2 files changed, 35 insertions(+), 3 deletions(-) diff --git a/drivers/nvdimm/namespace_devs.c b/drivers/nvdimm/namespace_devs.c index 032dc61725ff..68e89855f779 100644 --- a/drivers/nvdimm/namespace_devs.c +++ b/drivers/nvdimm/namespace_devs.c @@ -10,6 +10,7 @@ #include #include "nd-core.h" #include "pmem.h" +#include "pfn.h" #include "nd.h" static void namespace_io_release(struct device *dev) @@ -1739,6 +1740,17 @@ struct nd_namespace_common *nvdimm_namespace_common_probe(struct device *dev) return ERR_PTR(-ENODEV); } + if (pmem_should_map_pages(dev)) { + struct nd_namespace_io *nsio = to_nd_namespace_io(>dev); + struct resource *res = >res; + + if (!IS_ALIGNED(res->start | (res->end + 1), + memremap_compat_align())) { + dev_err(>dev, "%pr misaligned, unable to map\n", res); + return ERR_PTR(-EOPNOTSUPP); + } + } + if (is_namespace_pmem(>dev)) { struct nd_namespace_pmem *nspm; diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c index 79fe02d6f657..3bdd4b883d05 100644 --- a/drivers/nvdimm/pfn_devs.c +++ b/drivers/nvdimm/pfn_devs.c @@ -446,6 +446,7 @@ static bool nd_supported_alignment(unsigned long align) int nd_pfn_validate(struct nd_pfn *nd_pfn, const char *sig) { u64 checksum, offset; + struct resource *res; enum nd_pfn_mode mode; struct nd_namespace_io *nsio; unsigned long align, start_pad; @@ -578,13 +579,14 @@ int nd_pfn_validate(struct nd_pfn *nd_pfn, const char *sig) * established. */ nsio = to_nd_namespace_io(>dev); - if (offset >= resource_size(>res)) { + res = >res; + if (offset >= resource_size(res)) { dev_err(_pfn->dev, "pfn array size exceeds capacity of %s\n", dev_name(>dev)); return -EOPNOTSUPP; } - if ((align && !IS_ALIGNED(nsio->res.start + offset + start_pad, align)) + if ((align && !IS_ALIGNED(res->start + offset + start_pad, align)) || !IS_ALIGNED(offset, PAGE_SIZE)) { dev_err(_pfn->dev, "bad offset: %#llx dax disabled align: %#lx\n", @@ -592,6 +594,18 @@ int nd_pfn_validate(struct nd_pfn *nd_pfn, const char *sig) return -EOPNOTSUPP; } + if (!IS_ALIGNED(res->start + le32_to_cpu(pfn_sb->start_pad), + memremap_compat_align())) { + dev_err(_pfn->dev, "resource start misaligned\n"); + return -EOPNOTSUPP; + } + + if (!IS_ALIGNED(res->end + 1 - le32_to_cpu(pfn_sb->end_trunc), + memremap_compat_align())) { + dev_err(_pfn->dev, "resource end misaligned\n"); + return -EOPNOTSUPP; + } + return 0; } EXPORT_SYMBOL(nd_pfn_validate); @@ -750,7 +764,13 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn) start = nsio->res.start; size = resource_size(>res); npfns = PHYS_PFN(size - SZ_8K); - align = max(nd_pfn->align, SUBSECTION_SIZE); + align = max(nd_pfn->align, memremap_compat_align()); + if (!IS_ALIGNED(start, memremap_compat_align())) { + dev_err(_pfn->dev, "%s: start %pa misaligned to %#lx\n", + dev_name(>dev), , + memremap_compat_align()); + return -EINVAL; + } end_trunc = start + size - ALIGN_DOWN(start + size, align); if (nd_pfn->mode == PFN_MODE_PMEM) { /*
[PATCH v3 2/5] libnvdimm/pfn: Prevent raw mode fallback if pfn-infoblock valid
The EOPNOTSUPP return code from the pmem driver indicates that the namespace has a configuration that may be valid, but the current kernel does not support it. Expand this to all of the nd_pfn_validate() error conditions after the infoblock has been verified as self consistent. This prevents exposing the namespace to I/O when the infoblock needs to be corrected, or the system needs to be put into a different configuration (like changing the page size on PowerPC). Cc: Aneesh Kumar K.V Cc: Jeff Moyer Signed-off-by: Dan Williams --- drivers/nvdimm/pfn_devs.c |8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c index a5c25cb87116..79fe02d6f657 100644 --- a/drivers/nvdimm/pfn_devs.c +++ b/drivers/nvdimm/pfn_devs.c @@ -561,14 +561,14 @@ int nd_pfn_validate(struct nd_pfn *nd_pfn, const char *sig) dev_dbg(_pfn->dev, "align: %lx:%lx mode: %d:%d\n", nd_pfn->align, align, nd_pfn->mode, mode); - return -EINVAL; + return -EOPNOTSUPP; } } if (align > nvdimm_namespace_capacity(ndns)) { dev_err(_pfn->dev, "alignment: %lx exceeds capacity %llx\n", align, nvdimm_namespace_capacity(ndns)); - return -EINVAL; + return -EOPNOTSUPP; } /* @@ -581,7 +581,7 @@ int nd_pfn_validate(struct nd_pfn *nd_pfn, const char *sig) if (offset >= resource_size(>res)) { dev_err(_pfn->dev, "pfn array size exceeds capacity of %s\n", dev_name(>dev)); - return -EBUSY; + return -EOPNOTSUPP; } if ((align && !IS_ALIGNED(nsio->res.start + offset + start_pad, align)) @@ -589,7 +589,7 @@ int nd_pfn_validate(struct nd_pfn *nd_pfn, const char *sig) dev_err(_pfn->dev, "bad offset: %#llx dax disabled align: %#lx\n", offset, align); - return -ENXIO; + return -EOPNOTSUPP; } return 0;
[PATCH v3 1/5] mm/memremap_pages: Introduce memremap_compat_align()
The "sub-section memory hotplug" facility allows memremap_pages() users like libnvdimm to compensate for hardware platforms like x86 that have a section size larger than their hardware memory mapping granularity. The compensation that sub-section support affords is being tolerant of physical memory resources shifting by units smaller (64MiB on x86) than the memory-hotplug section size (128 MiB). Where the platform physical-memory mapping granularity is limited by the number and capability of address-decode-registers in the memory controller. While the sub-section support allows memremap_pages() to operate on sub-section (2MiB) granularity, the Power architecture may still require 16MiB alignment on "!radix_enabled()" platforms. In order for libnvdimm to be able to detect and manage this per-arch limitation, introduce memremap_compat_align() as a common minimum alignment across all driver-facing memory-mapping interfaces, and let Power override it to 16MiB in the "!radix_enabled()" case. The assumption / requirement for 16MiB to be a viable memremap_compat_align() value is that Power does not have platforms where its equivalent of address-decode-registers never hardware remaps a persistent memory resource on smaller than 16MiB boundaries. Note that I tried my best to not add a new Kconfig symbol, but header include entanglements defeated the #ifndef memremap_compat_align design pattern and the need to export it defeats the __weak design pattern for arch overrides. Based on an initial patch by Aneesh. Link: http://lore.kernel.org/r/capcyv4gbgnp95apyabcsocea50tqj9b5h__83vgngjq3oug...@mail.gmail.com Reported-by: Aneesh Kumar K.V Reported-by: Jeff Moyer Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Signed-off-by: Dan Williams --- arch/powerpc/Kconfig |1 + arch/powerpc/mm/ioremap.c | 21 + drivers/nvdimm/pfn_devs.c |2 +- include/linux/memremap.h |8 include/linux/mmzone.h|1 + lib/Kconfig |3 +++ mm/memremap.c | 23 +++ 7 files changed, 58 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 497b7d0b2d7e..e6ffe905e2b9 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -122,6 +122,7 @@ config PPC select ARCH_HAS_GCOV_PROFILE_ALL select ARCH_HAS_KCOV select ARCH_HAS_HUGEPD if HUGETLB_PAGE + select ARCH_HAS_MEMREMAP_COMPAT_ALIGN select ARCH_HAS_MMIOWB if PPC64 select ARCH_HAS_PHYS_TO_DMA select ARCH_HAS_PMEM_API diff --git a/arch/powerpc/mm/ioremap.c b/arch/powerpc/mm/ioremap.c index fc669643ce6a..b1a0aebe8c48 100644 --- a/arch/powerpc/mm/ioremap.c +++ b/arch/powerpc/mm/ioremap.c @@ -2,6 +2,7 @@ #include #include +#include #include #include @@ -97,3 +98,23 @@ void __iomem *do_ioremap(phys_addr_t pa, phys_addr_t offset, unsigned long size, return NULL; } + +#ifdef CONFIG_ZONE_DEVICE +/* + * Override the generic version in mm/memremap.c. + * + * With hash translation, the direct-map range is mapped with just one + * page size selected by htab_init_page_sizes(). Consult + * mmu_psize_defs[] to determine the minimum page size alignment. +*/ +unsigned long memremap_compat_align(void) +{ + unsigned int shift = mmu_psize_defs[mmu_linear_psize].shift; + + if (radix_enabled()) + return SUBSECTION_SIZE; + return max(SUBSECTION_SIZE, 1UL << shift); + +} +EXPORT_SYMBOL_GPL(memremap_compat_align); +#endif diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c index b94f7a7e94b8..a5c25cb87116 100644 --- a/drivers/nvdimm/pfn_devs.c +++ b/drivers/nvdimm/pfn_devs.c @@ -750,7 +750,7 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn) start = nsio->res.start; size = resource_size(>res); npfns = PHYS_PFN(size - SZ_8K); - align = max(nd_pfn->align, (1UL << SUBSECTION_SHIFT)); + align = max(nd_pfn->align, SUBSECTION_SIZE); end_trunc = start + size - ALIGN_DOWN(start + size, align); if (nd_pfn->mode == PFN_MODE_PMEM) { /* diff --git a/include/linux/memremap.h b/include/linux/memremap.h index 6fefb09af7c3..8af1cbd8f293 100644 --- a/include/linux/memremap.h +++ b/include/linux/memremap.h @@ -132,6 +132,7 @@ struct dev_pagemap *get_dev_pagemap(unsigned long pfn, unsigned long vmem_altmap_offset(struct vmem_altmap *altmap); void vmem_altmap_free(struct vmem_altmap *altmap, unsigned long nr_pfns); +unsigned long memremap_compat_align(void); #else static inline void *devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap) @@ -165,6 +166,12 @@ static inline void vmem_altmap_free(struct vmem_altmap *altmap, unsigned long nr_pfns) { } + +/* when memremap_pages() is disabled all archs can remap a single page */ +static inline unsigned long memremap_compat_align(void) +{ + return PAGE_SIZE; +}
[PATCH v3 0/5] libnvdimm: Cross-arch compatible namespace alignment
Changes since v2 [1]: - Fix up a missing space in flags_show() (Jeff) - Prompted by Jeff saying that v2 only worked for him if memremap_compat_align() returned PAGE_SIZE (which defeats the purpose) I developed a new ndctl unit test that runs through the possible legacy configurations that the kernel needs to support. Several changes fell out as a result: - Update nd_pfn_validate() to add more -EOPNOTSUPP cases. That error code indicates "Stop, the pfn looks coherent, but invalid. Do not proceed with exposing a raw namespace, require the user to investigate whether the infoblock needs to be rewritten, or the kernel configuration (like PAGE_SIZE) needs to change." - Move the validation of fsdax and devdax infoblocks to nd_pfn_validate() so that the presence of non-zero 'start_pad' and 'end_trunc' can be considered in the alignment validation. - Fail namespace creation when the base address is misaligned. A non-zero-start_pad prevents dax operation due to original bug of ->data_offset being base address relative when it should have been ->start_pad relative. So, reject all base address misaligned namespaces in nd_pfn_init(). [1]: http://lore.kernel.org/r/158155489850.3343782.2687127373754434980.st...@dwillia2-desk3.amr.corp.intel.com --- Review / merge logistics notes: Patch "libnvdimm/namespace: Enforce memremap_compat_align()" has changed enough that it needs to be reviewed again. Patch "mm/memremap_pages: Introduce memremap_compat_align()" still needs a PowerPC maintainer ack for the touches to arch/powerpc/mm/ioremap.c. --- Aneesh reports that PowerPC requires 16MiB alignment for the address range passed to devm_memremap_pages(), and Jeff reports that it is possible to create a misaligned namespace which blocks future namespace creation in that region. Both of these issues require namespace alignment to be managed at the region level rather than padding at the namespace level which has been a broken approach to date. Introduce memremap_compat_align() to indicate the hard requirements of an arch's memremap_pages() implementation. Use the maximum known memremap_compat_align() to set the default namespace alignment for libnvdimm. Consult that alignment when allocating free space. Finally, allow the default region alignment to be overridden to maintain the same namespace creation capability as previous kernels. The ndctl unit tests, which have some misaligned namespace assumptions, are updated to use the alignment override where necessary. Thanks to Aneesh for early feedback and testing on this improved alignment handling. --- Dan Williams (5): mm/memremap_pages: Introduce memremap_compat_align() libnvdimm/pfn: Prevent raw mode fallback if pfn-infoblock valid libnvdimm/namespace: Enforce memremap_compat_align() libnvdimm/region: Introduce NDD_LABELING libnvdimm/region: Introduce an 'align' attribute arch/powerpc/Kconfig |1 arch/powerpc/mm/ioremap.c | 21 + arch/powerpc/platforms/pseries/papr_scm.c |2 drivers/acpi/nfit/core.c |4 + drivers/nvdimm/dimm.c |2 drivers/nvdimm/dimm_devs.c| 95 + drivers/nvdimm/namespace_devs.c | 23 - drivers/nvdimm/nd.h |3 - drivers/nvdimm/pfn_devs.c | 34 ++- drivers/nvdimm/region_devs.c | 132 ++--- include/linux/libnvdimm.h |2 include/linux/memremap.h |8 ++ include/linux/mmzone.h|1 lib/Kconfig |3 + mm/memremap.c | 23 + 15 files changed, 307 insertions(+), 47 deletions(-) base-commit: 11a48a5a18c63fd7621bb050228cebf13566e4d8
Re: [PATCH v3 10/27] powerpc: Add driver for OpenCAPI Persistent Memory
Le 21/02/2020 à 04:27, Alastair D'Silva a écrit : From: Alastair D'Silva This driver exposes LPC memory on OpenCAPI pmem cards as an NVDIMM, allowing the existing nvram infrastructure to be used. Namespace metadata is stored on the media itself, so scm_reserve_metadata() maps 1 section's worth of PMEM storage at the start to hold this. The rest of the PMEM range is registered with libnvdimm as an nvdimm. scm_ndctl_config_read/write/size() provide callbacks to libnvdimm to access the metadata. Signed-off-by: Alastair D'Silva --- arch/powerpc/platforms/powernv/Kconfig| 3 + arch/powerpc/platforms/powernv/Makefile | 1 + arch/powerpc/platforms/powernv/pmem/Kconfig | 15 + arch/powerpc/platforms/powernv/pmem/Makefile | 7 + arch/powerpc/platforms/powernv/pmem/ocxl.c| 473 ++ .../platforms/powernv/pmem/ocxl_internal.h| 28 ++ 6 files changed, 527 insertions(+) create mode 100644 arch/powerpc/platforms/powernv/pmem/Kconfig create mode 100644 arch/powerpc/platforms/powernv/pmem/Makefile create mode 100644 arch/powerpc/platforms/powernv/pmem/ocxl.c create mode 100644 arch/powerpc/platforms/powernv/pmem/ocxl_internal.h diff --git a/arch/powerpc/platforms/powernv/Kconfig b/arch/powerpc/platforms/powernv/Kconfig index 938803eab0ad..fc8976af0e52 100644 --- a/arch/powerpc/platforms/powernv/Kconfig +++ b/arch/powerpc/platforms/powernv/Kconfig @@ -50,3 +50,6 @@ config PPC_VAS config SCOM_DEBUGFS bool "Expose SCOM controllers via debugfs" depends on DEBUG_FS + +source "arch/powerpc/platforms/powernv/pmem/Kconfig" + diff --git a/arch/powerpc/platforms/powernv/Makefile b/arch/powerpc/platforms/powernv/Makefile index c0f8120045c3..0bbd72988b6f 100644 --- a/arch/powerpc/platforms/powernv/Makefile +++ b/arch/powerpc/platforms/powernv/Makefile @@ -21,3 +21,4 @@ obj-$(CONFIG_PPC_VAS) += vas.o vas-window.o vas-debug.o obj-$(CONFIG_OCXL_BASE) += ocxl.o obj-$(CONFIG_SCOM_DEBUGFS) += opal-xscom.o obj-$(CONFIG_PPC_SECURE_BOOT) += opal-secvar.o +obj-$(CONFIG_LIBNVDIMM) += pmem/ diff --git a/arch/powerpc/platforms/powernv/pmem/Kconfig b/arch/powerpc/platforms/powernv/pmem/Kconfig new file mode 100644 index ..c5d927520920 --- /dev/null +++ b/arch/powerpc/platforms/powernv/pmem/Kconfig @@ -0,0 +1,15 @@ +# SPDX-License-Identifier: GPL-2.0-only +if LIBNVDIMM + +config OCXL_PMEM + tristate "OpenCAPI Persistent Memory" + depends on LIBNVDIMM && PPC_POWERNV && PCI && EEH && ZONE_DEVICE && OCXL + help + Exposes devices that implement the OpenCAPI Storage Class Memory + specification as persistent memory regions. You may also want + DEV_DAX, DEV_DAX_PMEM & FS_DAX if you plan on using DAX devices + stacked on top of this driver. + + Select N if unsure. + +endif diff --git a/arch/powerpc/platforms/powernv/pmem/Makefile b/arch/powerpc/platforms/powernv/pmem/Makefile new file mode 100644 index ..1c55c4193175 --- /dev/null +++ b/arch/powerpc/platforms/powernv/pmem/Makefile @@ -0,0 +1,7 @@ +# SPDX-License-Identifier: GPL-2.0 + +ccflags-$(CONFIG_PPC_WERROR) += -Werror + +obj-$(CONFIG_OCXL_PMEM) += ocxlpmem.o + +ocxlpmem-y := ocxl.o diff --git a/arch/powerpc/platforms/powernv/pmem/ocxl.c b/arch/powerpc/platforms/powernv/pmem/ocxl.c new file mode 100644 index ..3c4eeb5dcc0f --- /dev/null +++ b/arch/powerpc/platforms/powernv/pmem/ocxl.c @@ -0,0 +1,473 @@ +// SPDX-License-Id +// Copyright 2019 IBM Corp. + +/* + * A driver for OpenCAPI devices that implement the Storage Class + * Memory specification. + */ + +#include +#include +#include +#include +#include +#include "ocxl_internal.h" + + +static const struct pci_device_id ocxlpmem_pci_tbl[] = { + { PCI_DEVICE(PCI_VENDOR_ID_IBM, 0x0625), }, + { } +}; + +MODULE_DEVICE_TABLE(pci, ocxlpmem_pci_tbl); + +#define NUM_MINORS 256 // Total to reserve + +static dev_t ocxlpmem_dev; +static struct class *ocxlpmem_class; +static struct mutex minors_idr_lock; +static struct idr minors_idr; + +/** + * ndctl_config_write() - Handle a ND_CMD_SET_CONFIG_DATA command from ndctl + * @ocxlpmem: the device metadata + * @command: the incoming data to write + * Return: 0 on success, negative on failure + */ +static int ndctl_config_write(struct ocxlpmem *ocxlpmem, + struct nd_cmd_set_config_hdr *command) +{ + if (command->in_offset + command->in_length > LABEL_AREA_SIZE) + return -EINVAL; + + memcpy_flushcache(ocxlpmem->metadata_addr + command->in_offset, command->in_buf, + command->in_length); + + return 0; +} + +/** + * ndctl_config_read() - Handle a ND_CMD_GET_CONFIG_DATA command from ndctl + * @ocxlpmem: the device metadata + * @command: the read request + * Return: 0 on success, negative on failure + */ +static int ndctl_config_read(struct ocxlpmem *ocxlpmem, +struct nd_cmd_get_config_data_hdr
Re: [PATCH] macintosh: therm_windtunnel: fix regression when instantiating devices
On Tue, Feb 25, 2020 at 03:12:29PM +0100, Wolfram Sang wrote: > Removing attach_adapter from this driver caused a regression for at > least some machines. Those machines had the sensors described in their > DT, too, so they didn't need manual creation of the sensor devices. The > old code worked, though, because manual creation came first. Creation of > DT devices then failed later and caused error logs, but the sensors > worked nonetheless because of the manually created devices. > > When removing attach_adaper, manual creation now comes later and loses > the race. The sensor devices were already registered via DT, yet with > another binding, so the driver could not be bound to it. > > This fix refactors the code to remove the race and only manually creates > devices if there are no DT nodes present. Also, the DT binding is updated > to match both, the DT and manually created devices. Because we don't > know which device creation will be used at runtime, the code to start > the kthread is moved to do_probe() which will be called by both methods. > > Fixes: 3e7bed52719d ("macintosh: therm_windtunnel: drop using attach_adapter") > Link: https://bugzilla.kernel.org/show_bug.cgi?id=201723 > Reported-by: Erhard Furtner > Tested-by: Erhard Furtner > Signed-off-by: Wolfram Sang Applied to for-current, thanks! signature.asc Description: PGP signature
Re: [PATCH] macintosh: therm_windtunnel: fix regression when instantiating devices
> I think that would be best, it's more I2C related than powerpc arch > stuff that I could review. It is more DT handling than I2C, but I am happy to take this patch. signature.asc Description: PGP signature
RE: [PATCH v4 00/11] Add the multiple PF support for DWC and Layerscape
> -Original Message- > From: Lorenzo Pieralisi > Sent: 2020年2月28日 19:30 > To: Xiaowei Bao > Cc: robh...@kernel.org; mark.rutl...@arm.com; shawn...@kernel.org; Leo > Li ; kis...@ti.com; M.h. Lian > ; Mingkai Hu ; Roy Zang > ; jingooh...@gmail.com; > gustavo.pimen...@synopsys.com; linux-...@vger.kernel.org; > linux-arm-ker...@lists.infradead.org; devicet...@vger.kernel.org; > linux-ker...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org; Andrew Murray > > Subject: Re: [PATCH v4 00/11] Add the multiple PF support for DWC and > Layerscape > > On Tue, Sep 24, 2019 at 10:18:38AM +0800, Xiaowei Bao wrote: > > Add the PCIe EP multiple PF support for DWC and Layerscape, add the > > doorbell MSIX function for DWC, use list to manage the PF of one PCIe > > controller, and refactor the Layerscape EP driver due to some > > platforms difference. > > > > Xiaowei Bao (11): > > PCI: designware-ep: Add multiple PFs support for DWC > > PCI: designware-ep: Add the doorbell mode of MSI-X in EP mode > > PCI: designware-ep: Move the function of getting MSI capability > > forward > > PCI: designware-ep: Modify MSI and MSIX CAP way of finding > > dt-bindings: pci: layerscape-pci: add compatible strings for ls1088a > > and ls2088a > > PCI: layerscape: Fix some format issue of the code > > PCI: layerscape: Modify the way of getting capability with different > > PEX > > PCI: layerscape: Modify the MSIX to the doorbell mode > > PCI: layerscape: Add EP mode support for ls1088a and ls2088a > > arm64: dts: layerscape: Add PCIe EP node for ls1088a > > misc: pci_endpoint_test: Add LS1088a in pci_device_id table > > > > .../devicetree/bindings/pci/layerscape-pci.txt | 2 + > > arch/arm64/boot/dts/freescale/fsl-ls1088a.dtsi | 31 +++ > > drivers/misc/pci_endpoint_test.c | 2 + > > drivers/pci/controller/dwc/pci-layerscape-ep.c | 100 ++-- > > drivers/pci/controller/dwc/pcie-designware-ep.c| 255 > + > > drivers/pci/controller/dwc/pcie-designware.c | 59 +++-- > > drivers/pci/controller/dwc/pcie-designware.h | 48 +++- > > 7 files changed, 404 insertions(+), 93 deletions(-) > > Hi, > > are you resending this patchset ? I would also like Andrew and Kishon to have > a look and ACK relevant code before merging it. Thank you for following these patches, I have replay the email to Kishon, and I will decide whether to resend the sets patches based on his comments. Thanks Xiaowei > Thanks, > Lorenzo
RE: [PATCH v4 08/11] PCI: layerscape: Modify the MSIX to the doorbell mode
> -Original Message- > From: Kishon Vijay Abraham I > Sent: 2020年2月28日 19:41 > To: Xiaowei Bao ; robh...@kernel.org; > mark.rutl...@arm.com; shawn...@kernel.org; Leo Li > ; lorenzo.pieral...@arm.com; M.h. Lian > ; Mingkai Hu ; Roy Zang > ; jingooh...@gmail.com; > gustavo.pimen...@synopsys.com; andrew.mur...@arm.com; > linux-...@vger.kernel.org; linux-arm-ker...@lists.infradead.org; > devicet...@vger.kernel.org; linux-ker...@vger.kernel.org; > linuxppc-dev@lists.ozlabs.org > Subject: Re: [PATCH v4 08/11] PCI: layerscape: Modify the MSIX to the > doorbell mode > > Hi Xiaowei, > > On 24/09/19 7:48 am, Xiaowei Bao wrote: > > dw_pcie_ep_raise_msix_irq was never called in the exisitng driver > > before, because the ls1046a platform don't support the MSIX feature > > and msix_capable was always set to false. > > Now that add the ls1088a platform with MSIX support, but the existing > > dw_pcie_ep_raise_msix_irq doesn't work, so use the doorbell method to > > support the MSIX feature. > > > It does work after [1]. So the commit message might not be exactly true. Got it, I will verify it with your patch, do you mean that I should correct the commit message? I think we can reserve my MSI-X patch, this patch can provide another MSI-X trigger way, and this way is clearly stated in the DWC manual, thanks, please give your comments. Thanks Xiaowei > > [1] -> > https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.ke > rnel.org%2Fr%2F20200225081703.8857-1-kishon%40ti.comdata=02% > 7C01%7Cxiaowei.bao%40nxp.com%7C84191df0cd09451ef3e608d7bc427745 > %7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C637184865969684 > 169sdata=K9fvbpQ4xuuZKhWT6UV2M4SVyHxQ3LjpuJQztktLaRI%3D& > amp;reserved=0 > > Thanks > Kishon > > > > > Signed-off-by: Xiaowei Bao > > Reviewed-by: Andrew Murray > > --- > > v2: > > - No change > > v3: > > - Modify the commit message make it clearly. > > v4: > > - No change > > > > drivers/pci/controller/dwc/pci-layerscape-ep.c | 3 ++- > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/pci/controller/dwc/pci-layerscape-ep.c > > b/drivers/pci/controller/dwc/pci-layerscape-ep.c > > index 1e07287..5f0cb99 100644 > > --- a/drivers/pci/controller/dwc/pci-layerscape-ep.c > > +++ b/drivers/pci/controller/dwc/pci-layerscape-ep.c > > @@ -79,7 +79,8 @@ static int ls_pcie_ep_raise_irq(struct dw_pcie_ep *ep, > u8 func_no, > > case PCI_EPC_IRQ_MSI: > > return dw_pcie_ep_raise_msi_irq(ep, func_no, interrupt_num); > > case PCI_EPC_IRQ_MSIX: > > - return dw_pcie_ep_raise_msix_irq(ep, func_no, interrupt_num); > > + return dw_pcie_ep_raise_msix_irq_doorbell(ep, func_no, > > + interrupt_num); > > default: > > dev_err(pci->dev, "UNKNOWN IRQ type\n"); > > return -EINVAL; > >
Re: [PATCH v4 08/11] PCI: layerscape: Modify the MSIX to the doorbell mode
Hi Xiaowei, On 24/09/19 7:48 am, Xiaowei Bao wrote: > dw_pcie_ep_raise_msix_irq was never called in the exisitng driver > before, because the ls1046a platform don't support the MSIX feature > and msix_capable was always set to false. > Now that add the ls1088a platform with MSIX support, but the existing > dw_pcie_ep_raise_msix_irq doesn't work, so use the doorbell method to > support the MSIX feature. It does work after [1]. So the commit message might not be exactly true. [1] -> https://lore.kernel.org/r/20200225081703.8857-1-kis...@ti.com Thanks Kishon > > Signed-off-by: Xiaowei Bao > Reviewed-by: Andrew Murray > --- > v2: > - No change > v3: > - Modify the commit message make it clearly. > v4: > - No change > > drivers/pci/controller/dwc/pci-layerscape-ep.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/drivers/pci/controller/dwc/pci-layerscape-ep.c > b/drivers/pci/controller/dwc/pci-layerscape-ep.c > index 1e07287..5f0cb99 100644 > --- a/drivers/pci/controller/dwc/pci-layerscape-ep.c > +++ b/drivers/pci/controller/dwc/pci-layerscape-ep.c > @@ -79,7 +79,8 @@ static int ls_pcie_ep_raise_irq(struct dw_pcie_ep *ep, u8 > func_no, > case PCI_EPC_IRQ_MSI: > return dw_pcie_ep_raise_msi_irq(ep, func_no, interrupt_num); > case PCI_EPC_IRQ_MSIX: > - return dw_pcie_ep_raise_msix_irq(ep, func_no, interrupt_num); > + return dw_pcie_ep_raise_msix_irq_doorbell(ep, func_no, > + interrupt_num); > default: > dev_err(pci->dev, "UNKNOWN IRQ type\n"); > return -EINVAL; >
Re: [PATCH] selftests: pidfd: Add pidfd_fdinfo_test in .gitignore
On Fri, Feb 28, 2020 at 01:18:44AM +0100, Christian Brauner wrote: > On February 28, 2020 1:00:08 AM GMT+01:00, Christophe Leroy > wrote: > >The commit identified below added pidfd_fdinfo_test > >but failed to add it to .gitignore > > > >Fixes: 2def297ec7fb ("pidfd: add tests for NSpid info in fdinfo") > >Cc: sta...@vger.kernel.org > >Signed-off-by: Christophe Leroy > >--- > > tools/testing/selftests/pidfd/.gitignore | 1 + > > 1 file changed, 1 insertion(+) > > > >diff --git a/tools/testing/selftests/pidfd/.gitignore > >b/tools/testing/selftests/pidfd/.gitignore > >index 3a779c084d96..39559d723c41 100644 > >--- a/tools/testing/selftests/pidfd/.gitignore > >+++ b/tools/testing/selftests/pidfd/.gitignore > >@@ -2,4 +2,5 @@ pidfd_open_test > > pidfd_poll_test > > pidfd_test > > pidfd_wait > >+pidfd_fdinfo_test > > pidfd_getfd_test > > Thanks for spotting this. > I'll pick this up along with other fixes I have waiting. > > Acked-by: Christian Brauner Applied, thanks! Christian
Re: [PATCH v4 00/11] Add the multiple PF support for DWC and Layerscape
On Tue, Sep 24, 2019 at 10:18:38AM +0800, Xiaowei Bao wrote: > Add the PCIe EP multiple PF support for DWC and Layerscape, add > the doorbell MSIX function for DWC, use list to manage the PF of > one PCIe controller, and refactor the Layerscape EP driver due to > some platforms difference. > > Xiaowei Bao (11): > PCI: designware-ep: Add multiple PFs support for DWC > PCI: designware-ep: Add the doorbell mode of MSI-X in EP mode > PCI: designware-ep: Move the function of getting MSI capability > forward > PCI: designware-ep: Modify MSI and MSIX CAP way of finding > dt-bindings: pci: layerscape-pci: add compatible strings for ls1088a > and ls2088a > PCI: layerscape: Fix some format issue of the code > PCI: layerscape: Modify the way of getting capability with different > PEX > PCI: layerscape: Modify the MSIX to the doorbell mode > PCI: layerscape: Add EP mode support for ls1088a and ls2088a > arm64: dts: layerscape: Add PCIe EP node for ls1088a > misc: pci_endpoint_test: Add LS1088a in pci_device_id table > > .../devicetree/bindings/pci/layerscape-pci.txt | 2 + > arch/arm64/boot/dts/freescale/fsl-ls1088a.dtsi | 31 +++ > drivers/misc/pci_endpoint_test.c | 2 + > drivers/pci/controller/dwc/pci-layerscape-ep.c | 100 ++-- > drivers/pci/controller/dwc/pcie-designware-ep.c| 255 > + > drivers/pci/controller/dwc/pcie-designware.c | 59 +++-- > drivers/pci/controller/dwc/pcie-designware.h | 48 +++- > 7 files changed, 404 insertions(+), 93 deletions(-) Hi, are you resending this patchset ? I would also like Andrew and Kishon to have a look and ACK relevant code before merging it. Thanks, Lorenzo
[PATCHv2 2/2] pSeries/papr_scm: buffer pmem's bound addr in dt for kexec kernel
At present, plpar_hcall(H_SCM_BIND_MEM, ...) takes a very long time, so if dumping to fsdax, it will take a very long time. Take a closer look, during the papr_scm initialization, the only configuration is through drc_pmem_bind()-> plpar_hcall(H_SCM_BIND_MEM, ...), which helps to set up the bound address. On pseries, for kexec -l/-p kernel, there is no reset of hardware, and this step can be stepped around to save times. So the pmem bound address can be passed to the 2nd kernel through a dynamic added property "bound-addr" in dt node 'ibm,pmemory'. Signed-off-by: Pingfan Liu To: linuxppc-dev@lists.ozlabs.org Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Michael Ellerman Cc: Hari Bathini Cc: Aneesh Kumar K.V Cc: Oliver O'Halloran Cc: Dan Williams Cc: Andrew Donnellan Cc: Christophe Leroy Cc: ke...@lists.infradead.org --- note: This patch has not been tested since I can not get such a pseries with pmem. Please kindly to give some suggestion, thanks. arch/powerpc/platforms/pseries/papr_scm.c | 32 +-- 1 file changed, 22 insertions(+), 10 deletions(-) diff --git a/arch/powerpc/platforms/pseries/papr_scm.c b/arch/powerpc/platforms/pseries/papr_scm.c index 0b4467e..40cd214 100644 --- a/arch/powerpc/platforms/pseries/papr_scm.c +++ b/arch/powerpc/platforms/pseries/papr_scm.c @@ -14,6 +14,7 @@ #include #include +#include "of_helpers.h" #define BIND_ANY_ADDR (~0ul) @@ -383,7 +384,7 @@ static int papr_scm_probe(struct platform_device *pdev) { struct device_node *dn = pdev->dev.of_node; u32 drc_index, metadata_size; - u64 blocks, block_size; + u64 blocks, block_size, bound_addr = 0; struct papr_scm_priv *p; const char *uuid_str; u64 uuid[2]; @@ -440,17 +441,28 @@ static int papr_scm_probe(struct platform_device *pdev) p->metadata_size = metadata_size; p->pdev = pdev; - /* request the hypervisor to bind this region to somewhere in memory */ - rc = drc_pmem_bind(p); + of_property_read_u64(dn, "bound-addr", _addr); + if (bound_addr) { + p->bound_addr = bound_addr; + } else { + struct property *property; + u64 big; - /* If phyp says drc memory still bound then force unbound and retry */ - if (rc == H_OVERLAP) - rc = drc_pmem_query_n_bind(p); + /* request the hypervisor to bind this region to somewhere in memory */ + rc = drc_pmem_bind(p); - if (rc != H_SUCCESS) { - dev_err(>pdev->dev, "bind err: %d\n", rc); - rc = -ENXIO; - goto err; + /* If phyp says drc memory still bound then force unbound and retry */ + if (rc == H_OVERLAP) + rc = drc_pmem_query_n_bind(p); + + if (rc != H_SUCCESS) { + dev_err(>pdev->dev, "bind err: %d\n", rc); + rc = -ENXIO; + goto err; + } + big = cpu_to_be64(p->bound_addr); + property = new_property("bound-addr", sizeof(u64), , NULL); + of_add_property(dn, property); } /* setup the resource for the newly bound range */ -- 2.7.5
[PATCHv2 1/2] powerpc/of: split out new_property() for reusing
Splitting out new_property() for coming reusing and moving it to of_helpers.c. Also do some coding style cleanup. Signed-off-by: Pingfan Liu To: linuxppc-dev@lists.ozlabs.org Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Michael Ellerman Cc: Hari Bathini Cc: Aneesh Kumar K.V Cc: Oliver O'Halloran Cc: Dan Williams Cc: Andrew Donnellan Cc: Christophe Leroy Cc: ke...@lists.infradead.org --- arch/powerpc/platforms/pseries/of_helpers.c | 28 arch/powerpc/platforms/pseries/of_helpers.h | 3 +++ arch/powerpc/platforms/pseries/reconfig.c | 26 -- 3 files changed, 31 insertions(+), 26 deletions(-) diff --git a/arch/powerpc/platforms/pseries/of_helpers.c b/arch/powerpc/platforms/pseries/of_helpers.c index 66dfd82..1022e0f 100644 --- a/arch/powerpc/platforms/pseries/of_helpers.c +++ b/arch/powerpc/platforms/pseries/of_helpers.c @@ -7,6 +7,34 @@ #include "of_helpers.h" +struct property *new_property(const char *name, const int length, + const unsigned char *value, struct property *last) +{ + struct property *new = kzalloc(sizeof(*new), GFP_KERNEL); + + if (!new) + return NULL; + + new->name = kstrdup(name, GFP_KERNEL); + if (!new->name) + goto cleanup; + new->value = kmalloc(length + 1, GFP_KERNEL); + if (!new->value) + goto cleanup; + + memcpy(new->value, value, length); + *(((char *)new->value) + length) = 0; + new->length = length; + new->next = last; + return new; + +cleanup: + kfree(new->name); + kfree(new->value); + kfree(new); + return NULL; +} + /** * pseries_of_derive_parent - basically like dirname(1) * @path: the full_name of a node to be added to the tree diff --git a/arch/powerpc/platforms/pseries/of_helpers.h b/arch/powerpc/platforms/pseries/of_helpers.h index decad65..34add82 100644 --- a/arch/powerpc/platforms/pseries/of_helpers.h +++ b/arch/powerpc/platforms/pseries/of_helpers.h @@ -4,6 +4,9 @@ #include +struct property *new_property(const char *name, const int length, + const unsigned char *value, struct property *last); + struct device_node *pseries_of_derive_parent(const char *path); #endif /* _PSERIES_OF_HELPERS_H */ diff --git a/arch/powerpc/platforms/pseries/reconfig.c b/arch/powerpc/platforms/pseries/reconfig.c index 7f7369f..8e5a2ba 100644 --- a/arch/powerpc/platforms/pseries/reconfig.c +++ b/arch/powerpc/platforms/pseries/reconfig.c @@ -165,32 +165,6 @@ static char * parse_next_property(char *buf, char *end, char **name, int *length return tmp; } -static struct property *new_property(const char *name, const int length, -const unsigned char *value, struct property *last) -{ - struct property *new = kzalloc(sizeof(*new), GFP_KERNEL); - - if (!new) - return NULL; - - if (!(new->name = kstrdup(name, GFP_KERNEL))) - goto cleanup; - if (!(new->value = kmalloc(length + 1, GFP_KERNEL))) - goto cleanup; - - memcpy(new->value, value, length); - *(((char *)new->value) + length) = 0; - new->length = length; - new->next = last; - return new; - -cleanup: - kfree(new->name); - kfree(new->value); - kfree(new); - return NULL; -} - static int do_add_node(char *buf, size_t bufsize) { char *path, *end, *name; -- 2.7.5
Re: [PATCH 3/3] pseries/scm: buffer pmem's bound addr in dt for kexec kernel
On Fri, Feb 28, 2020 at 2:52 PM Christophe Leroy wrote: > > > > Le 28/02/2020 à 06:53, Pingfan Liu a écrit : > > At present, plpar_hcall(H_SCM_BIND_MEM, ...) takes a very long time, so > > if dumping to fsdax, it will take a very long time. > > > > Take a closer look, during the papr_scm initialization, the only > > configuration is through drc_pmem_bind()-> plpar_hcall(H_SCM_BIND_MEM, > > ...), which helps to set up the bound address. > > > > On pseries, for kexec -l/-p kernel, there is no reset of hardware, and this > > step can be stepped around to save times. So the pmem bound address can be > > passed to the 2nd kernel through a dynamic added property "bound-addr" in > > dt node 'ibm,pmemory'. > > > > Signed-off-by: Pingfan Liu > > To: linuxppc-dev@lists.ozlabs.org > > Cc: Benjamin Herrenschmidt > > Cc: Paul Mackerras > > Cc: Michael Ellerman > > Cc: Hari Bathini > > Cc: Aneesh Kumar K.V > > Cc: Oliver O'Halloran > > Cc: Dan Williams > > Cc: ke...@lists.infradead.org > > --- > > note: I can not find such a pseries machine, and not finish it yet. > > --- > > arch/powerpc/platforms/pseries/papr_scm.c | 32 > > +-- > > 1 file changed, 22 insertions(+), 10 deletions(-) > > > > diff --git a/arch/powerpc/platforms/pseries/papr_scm.c > > b/arch/powerpc/platforms/pseries/papr_scm.c > > index c2ef320..555e746 100644 > > --- a/arch/powerpc/platforms/pseries/papr_scm.c > > +++ b/arch/powerpc/platforms/pseries/papr_scm.c > > @@ -382,7 +382,7 @@ static int papr_scm_probe(struct platform_device *pdev) > > { > > struct device_node *dn = pdev->dev.of_node; > > u32 drc_index, metadata_size; > > - u64 blocks, block_size; > > + u64 blocks, block_size, bound_addr = 0; > > struct papr_scm_priv *p; > > const char *uuid_str; > > u64 uuid[2]; > > @@ -439,17 +439,29 @@ static int papr_scm_probe(struct platform_device > > *pdev) > > p->metadata_size = metadata_size; > > p->pdev = pdev; > > > > - /* request the hypervisor to bind this region to somewhere in memory > > */ > > - rc = drc_pmem_bind(p); > > + of_property_read_u64(dn, "bound-addr", _addr); > > + if (bound_addr) > > + p->bound_addr = bound_addr; > > + else { > > All legs of an if/else must have { } when one leg need them, see codying > style. OK, > > > + struct property *property; > > + u64 big; > > > > - /* If phyp says drc memory still bound then force unbound and retry */ > > - if (rc == H_OVERLAP) > > - rc = drc_pmem_query_n_bind(p); > > + /* request the hypervisor to bind this region to somewhere in > > memory */ > > + rc = drc_pmem_bind(p); > > > > - if (rc != H_SUCCESS) { > > - dev_err(>pdev->dev, "bind err: %d\n", rc); > > - rc = -ENXIO; > > - goto err; > > + /* If phyp says drc memory still bound then force unbound and > > retry */ > > + if (rc == H_OVERLAP) > > + rc = drc_pmem_query_n_bind(p); > > + > > + if (rc != H_SUCCESS) { > > + dev_err(>pdev->dev, "bind err: %d\n", rc); > > + rc = -ENXIO; > > + goto err; > > + } > > + big = cpu_to_be64(p->bound_addr); > > + property = new_property("bound-addr", sizeof(u64), , > > + NULL); > > Why plitting this line in two parts ? You have lines far longer above. > In powerpc we allow lines 90 chars long. OK, good to know it. Thanks, Pingfan
Re: [PATCH] mm/debug: Add tests validating arch page table helpers for core features
On 02/27/2020 04:12 PM, Christophe Leroy wrote: > > > Le 27/02/2020 à 11:33, Anshuman Khandual a écrit : >> This adds new tests validating arch page table helpers for these following >> core memory features. These tests create and test specific mapping types at >> various page table levels. >> >> * SPECIAL mapping >> * PROTNONE mapping >> * DEVMAP mapping >> * SOFTDIRTY mapping >> * SWAP mapping >> * MIGRATION mapping >> * HUGETLB mapping > > For testing HUGETLB mappings, you also have to include tests of hugepd > functions/helpers. Not all archictures have hugepage size which matches with > page tables levels (e.g. powerpc). Those architectures use hugepd_t. Dont see much hugepd_t in generic HugeTLB. Just wondering which generic hugepd helpers can be tested here. Could you please be bit more specific. As we have not yet started looking for arch specific page table helpers test requirements, all the test scenarios here need to be generic. > > Christophe >