Re: [PATCH v3 2/8] cxl/acpi: Add root device lockdep validation
On Fri, Apr 22, 2022 at 5:08 PM Dan Williams wrote: > > On Fri, Apr 22, 2022 at 4:58 PM Ira Weiny wrote: > > > > On Thu, Apr 21, 2022 at 08:33:18AM -0700, Dan Williams wrote: > > > The CXL "root" device, ACPI0017, is an attach point for coordinating > > > platform level CXL resources and is the parent device for a CXL port > > > topology tree. As such it has distinct locking rules relative to other > > > CXL subsystem objects, but because it is an ACPI device the lock class > > > is established well before it is given to the cxl_acpi driver. > > > > This final sentence gave me pause because it implied that the device lock > > class > > was set to something other than no validate. But I don't see that anywhere > > in > > the acpi code. So given that it looks to me like ACPI is just using the > > default no validate class... > > Oh, good observation. *If* ACPI had set a custom lock class then > cxl_acpi would need to be careful to restore that ACPI-specific class > and not reset it to "no validate" on exit, or skip setting its own > custom class. However, I think for generic buses like ACPI that feed > devices into other subsystems it likely has little reason to set its > own class. For safety, since device_lock_set_class() is general > purpose, I'll have it emit a debug message and fail if the class is > not "no validate" on entry. > So this turned out way uglier than I expected: drivers/cxl/acpi.c |4 +++- include/linux/device.h | 33 + 2 files changed, 28 insertions(+), 9 deletions(-) ...so I'm going to drop it and just add a comment about the expectations. As Peter said there's already a multitude of ways to cause false positive / negative results with lockdep so this is just one more area where one needs to be careful and understand the lock context they might be overriding.
Re: [PATCH v3 8/8] nvdimm: Fix firmware activation deadlock scenarios
On Fri, Apr 22, 2022 at 9:28 PM Ira Weiny wrote: > > On Thu, Apr 21, 2022 at 08:33:51AM -0700, Dan Williams wrote: > > Lockdep reports the following deadlock scenarios for CXL root device > > power-management, device_prepare(), operations, and device_shutdown() > > operations for 'nd_region' devices: > > > > --- > > Chain exists of: > >&nvdimm_region_key --> &nvdimm_bus->reconfig_mutex --> > > system_transition_mutex > > > > Possible unsafe locking scenario: > > > > CPU0CPU1 > > > >lock(system_transition_mutex); > > lock(&nvdimm_bus->reconfig_mutex); > > lock(system_transition_mutex); > >lock(&nvdimm_region_key); > > > > -- > > > > Chain exists of: > >&cxl_nvdimm_bridge_key --> acpi_scan_lock --> &cxl_root_key > > > > Possible unsafe locking scenario: > > > > CPU0CPU1 > > > >lock(&cxl_root_key); > > lock(acpi_scan_lock); > > lock(&cxl_root_key); > >lock(&cxl_nvdimm_bridge_key); > > > > --- > > > > These stem from holding nvdimm_bus_lock() over hibernate_quiet_exec() > > which walks the entire system device topology taking device_lock() along > > the way. The nvdimm_bus_lock() is protecting against unregistration, > > multiple simultaneous ops callers, and preventing activate_show() from > > racing activate_store(). For the first 2, the lock is redundant. > > Unregistration already flushes all ops users, and sysfs already prevents > > multiple threads to be active in an ops handler at the same time. For > > the last userspace should already be waiting for its last > > activate_store() to complete, and does not need activate_show() to flush > > the write side, so this lock usage can be deleted in these attributes. > > > > I'm sorry if this is obvious but why can't the locking be removed from > capability_show() and nvdimm_bus_firmware_visible() as well? It can, that's a good catch, thanks.
Re: [PATCH v13 0/7] fsdax: introduce fs query to support reflink
On Fri, Apr 22, 2022 at 5:02 PM Dave Chinner wrote: > > On Fri, Apr 22, 2022 at 02:27:32PM -0700, Dan Williams wrote: > > On Thu, Apr 21, 2022 at 12:47 AM Dave Chinner wrote: > > > > > > On Wed, Apr 20, 2022 at 10:54:59PM -0700, Christoph Hellwig wrote: > > > > On Thu, Apr 21, 2022 at 02:35:02PM +1000, Dave Chinner wrote: > > > > > Sure, I'm not a maintainer and just the stand-in patch shepherd for > > > > > a single release. However, being unable to cleanly merge code we > > > > > need integrated into our local subsystem tree for integration > > > > > testing because a patch dependency with another subsystem won't gain > > > > > a stable commit ID until the next merge window is distinctly > > > > > suboptimal. > > > > > > > > Yes. Which is why we've taken a lot of mm patchs through other trees, > > > > sometimes specilly crafted for that. So I guess in this case we'll > > > > just need to take non-trivial dependencies into the XFS tree, and just > > > > deal with small merge conflicts for the trivial ones. > > > > > > OK. As Naoyo has pointed out, the first dependency/conflict Ruan has > > > listed looks trivial to resolve. > > > > > > The second dependency, OTOH, is on a new function added in the patch > > > pointed to. That said, at first glance it looks to be independent of > > > the first two patches in that series so I might just be able to pull > > > that one patch in and have that leave us with a working > > > fsdax+reflink tree. > > > > > > Regardless, I'll wait to see how much work the updated XFS/DAX > > > reflink enablement patchset still requires when Ruan posts it before > > > deciding what to do here. If it isn't going to be a merge > > > candidate, what to do with this patchset is moot because there's > > > little to test without reflink enabled... > > > > I do have a use case for this work absent the reflink work. Recall we > > had a conversation about how to communicate "dax-device has been > > ripped away from the fs" events and we ended up on the idea of reusing > > ->notify_failure(), but with the device's entire logical address range > > as the notification span. That will let me unwind and delete the > > PTE_DEVMAP infrastructure for taking extra device references to hold > > off device-removal. Instead ->notify_failure() arranges for all active > > DAX mappings to be invalidated and allow the removal to proceed > > especially since physical removal does not care about software pins. > > Sure. My point is that if the reflink enablement isn't ready to go, > then from an XFS POV none of this matters in this cycle and we can > just leave the dependencies to commit via Andrew's tree. Hence by > the time we get to the reflink enablement all the prior dependencies > will have been merged and have stable commit IDs, and we can just > stage this series and the reflink enablement as we normally would in > the next cycle. > > However, if we don't get the XFS reflink dax enablement sorted out > in the next week or two, then we don't need this patchset in this > cycle. Hence if you still need this patchset for other code you need > to merge in this cycle, then you're the poor schmuck that has to run > the mm-tree conflict guantlet to get a stable commit ID for the > dependent patches in this cycle, not me Yup. Let's give it another week or so to see if the reflink rebase materializes and go from there.
[PATCH] tools/testing/nvdimm: remove unneeded flush_workqueue
From: ran jianping All work currently pending will be done first by calling destroy_workqueue, so there is no need to flush it explicitly. Reported-by: Zeal Robot Signed-off-by: ran jianping --- tools/testing/nvdimm/test/nfit.c | 1 - 1 file changed, 1 deletion(-) diff --git a/tools/testing/nvdimm/test/nfit.c b/tools/testing/nvdimm/test/nfit.c index 1da76ccde448..e7e1a640e482 100644 --- a/tools/testing/nvdimm/test/nfit.c +++ b/tools/testing/nvdimm/test/nfit.c @@ -3375,7 +3375,6 @@ static __exit void nfit_test_exit(void) { int i; - flush_workqueue(nfit_wq); destroy_workqueue(nfit_wq); for (i = 0; i < NUM_NFITS; i++) platform_device_unregister(&instances[i]->pdev); -- 2.25.1