Re: [GIT PULL] LIBNVDIMM update for v5.18
The pull request you sent on Tue, 29 Mar 2022 13:54:41 -0700: > git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm > tags/libnvdimm-for-5.18 has been merged into torvalds/linux.git: https://git.kernel.org/torvalds/c/ee96dd9614f1c139e719dd2f296acbed7f1ab4b8 Thank you! -- Deet-doot-dot, I am a bot. https://korg.docs.kernel.org/prtracker.html
Re: [PATCH v11 1/8] dax: Introduce holder for dax_device
On Wed, Mar 30, 2022 at 08:49:29AM -0700, Christoph Hellwig wrote: > On Wed, Mar 30, 2022 at 06:58:21PM +0800, Shiyang Ruan wrote: > > As the code I pasted before, pmem driver will subtract its ->data_offset, > > which is byte-based. And the filesystem who implements ->notify_failure() > > will calculate the offset in unit of byte again. > > > > So, leave its function signature byte-based, to avoid repeated conversions. > > I'm actually fine either way, so I'll wait for Dan to comment. FWIW I'd convinced myself that the reason for using byte units is to make it possible to reduce the pmem failure blast radius to subpage units... but then I've also been distracted for months. :/ --D
Re: [PATCH v11 7/8] xfs: Implement ->notify_failure() for XFS
On Wed, Mar 30, 2022 at 11:16:10PM +0800, Shiyang Ruan wrote: > > > +#if IS_ENABLED(CONFIG_MEMORY_FAILURE) && IS_ENABLED(CONFIG_FS_DAX) > > > > No real need for the IS_ENABLED. Also any reason to even build this > > file if the options are not set? It seems like > > xfs_dax_holder_operations should just be defined to NULL and the > > whole file not supported if we can't support the functionality. > > Got it. These two CONFIG seem not related for now. So, I think I should > wrap these code with #ifdef CONFIG_MEMORY_FAILURE here, and add > `xfs-$(CONFIG_FS_DAX) += xfs_notify_failure.o` in the makefile. I'd do ifeq ($(CONFIG_MEMORY_FAILURE),y) xfs-$(CONFIG_FS_DAX) += xfs_notify_failure.o endif in the makefile and keep it out of the actual source file entirely. > > > + > > > + /* Ignore the range out of filesystem area */ > > > + if ((offset + len) < ddev_start) > > > > No need for the inner braces. > > > > > + if ((offset + len) > ddev_end) > > > > No need for the braces either. > > Really no need? It is to make sure the range to be handled won't out of the > filesystem area. And make sure the @offset and @len are valid and correct > after subtract the bbdev_start. Yes, but there is no need for the braces per the precedence rules in C. So these could be: if (offset + len < ddev_start) and if (offset + len > ddev_end)
Re: [PATCH v11 1/8] dax: Introduce holder for dax_device
On Wed, Mar 30, 2022 at 06:58:21PM +0800, Shiyang Ruan wrote: > As the code I pasted before, pmem driver will subtract its ->data_offset, > which is byte-based. And the filesystem who implements ->notify_failure() > will calculate the offset in unit of byte again. > > So, leave its function signature byte-based, to avoid repeated conversions. I'm actually fine either way, so I'll wait for Dan to comment.
Re: [PATCH v11 7/8] xfs: Implement ->notify_failure() for XFS
在 2022/3/30 14:00, Christoph Hellwig 写道: @@ -1892,6 +1893,8 @@ xfs_free_buftarg( list_lru_destroy(&btp->bt_lru); blkdev_issue_flush(btp->bt_bdev); + if (btp->bt_daxdev) + dax_unregister_holder(btp->bt_daxdev, btp->bt_mount); fs_put_dax(btp->bt_daxdev); kmem_free(btp); @@ -1939,6 +1942,7 @@ xfs_alloc_buftarg( struct block_device *bdev) { xfs_buftarg_t *btp; + int error; btp = kmem_zalloc(sizeof(*btp), KM_NOFS); @@ -1946,6 +1950,14 @@ xfs_alloc_buftarg( btp->bt_dev = bdev->bd_dev; btp->bt_bdev = bdev; btp->bt_daxdev = fs_dax_get_by_bdev(bdev, &btp->bt_dax_part_off); + if (btp->bt_daxdev) { + error = dax_register_holder(btp->bt_daxdev, mp, + &xfs_dax_holder_operations); + if (error) { + xfs_err(mp, "DAX device already in use?!"); + goto error_free; + } + } It seems to me that just passing the holder and holder ops to fs_dax_get_by_bdev and the holder to dax_unregister_holder would significantly simply the interface here. Dan, what do you think? +#if IS_ENABLED(CONFIG_MEMORY_FAILURE) && IS_ENABLED(CONFIG_FS_DAX) No real need for the IS_ENABLED. Also any reason to even build this file if the options are not set? It seems like xfs_dax_holder_operations should just be defined to NULL and the whole file not supported if we can't support the functionality. Got it. These two CONFIG seem not related for now. So, I think I should wrap these code with #ifdef CONFIG_MEMORY_FAILURE here, and add `xfs-$(CONFIG_FS_DAX) += xfs_notify_failure.o` in the makefile. Dan: not for this series, but is there any reason not to require MEMORY_FAILURE for DAX to start with? + + ddev_start = mp->m_ddev_targp->bt_dax_part_off; + ddev_end = ddev_start + + (mp->m_ddev_targp->bt_bdev->bd_nr_sectors << SECTOR_SHIFT) - 1; This should use bdev_nr_bytes. OK. But didn't we say we don't want to support notifications on partitioned devices and thus don't actually need all this? + + /* Ignore the range out of filesystem area */ + if ((offset + len) < ddev_start) No need for the inner braces. + if ((offset + len) > ddev_end) No need for the braces either. Really no need? It is to make sure the range to be handled won't out of the filesystem area. And make sure the @offset and @len are valid and correct after subtract the bbdev_start. diff --git a/fs/xfs/xfs_notify_failure.h b/fs/xfs/xfs_notify_failure.h new file mode 100644 index ..76187b9620f9 --- /dev/null +++ b/fs/xfs/xfs_notify_failure.h @@ -0,0 +1,10 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (c) 2022 Fujitsu. All Rights Reserved. + */ +#ifndef __XFS_NOTIFY_FAILURE_H__ +#define __XFS_NOTIFY_FAILURE_H__ + +extern const struct dax_holder_operations xfs_dax_holder_operations; + +#endif /* __XFS_NOTIFY_FAILURE_H__ */ Dowe really need a new header for this vs just sequeezing it into xfs_super.h or something like that? Yes, I'll move it into xfs_super.h. The xfs_notify_failure.c was splitted from xfs_super.c in the old patch. There is no need to create a header file for only single line of code. diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c index e8f37bdc8354..b8de6ed2c888 100644 --- a/fs/xfs/xfs_super.c +++ b/fs/xfs/xfs_super.c @@ -353,6 +353,12 @@ xfs_setup_dax_always( return -EINVAL; } + if (xfs_has_reflink(mp) && !xfs_has_rmapbt(mp)) { + xfs_alert(mp, + "need rmapbt when both DAX and reflink enabled."); + return -EINVAL; + } Right now we can't even enable reflink with DAX yet, so adding this here seems premature - it should go into the patch allowing DAX+reflink. Yes. I'll remove it for now. -- Thanks, Ruan.
Re: [PATCH v11 1/8] dax: Introduce holder for dax_device
在 2022/3/30 18:13, Christoph Hellwig 写道: On Wed, Mar 30, 2022 at 06:03:01PM +0800, Shiyang Ruan wrote: Because I am not sure if the offset between each layer is page aligned. For example, when pmem dirver handles ->memory_failure(), it should subtract its ->data_offset when it calls dax_holder_notify_failure(). If they aren't, none of the DAX machinery would work. OK. Got it. So, use page-based function signature for ->memory_failure(): int (*memory_failure)(struct dev_pagemap *pgmap, unsigned long pfn, unsigned long nr_pfns, int flags); As the code I pasted before, pmem driver will subtract its ->data_offset, which is byte-based. And the filesystem who implements ->notify_failure() will calculate the offset in unit of byte again. So, leave its function signature byte-based, to avoid repeated conversions. int (*notify_failure)(struct dax_device *dax_dev, u64 offset, u64 len, int mf_flags); What do you think? -- Thanks, Ruan.
Re: [PATCH v11 1/8] dax: Introduce holder for dax_device
On Wed, Mar 30, 2022 at 06:03:01PM +0800, Shiyang Ruan wrote: > > Because I am not sure if the offset between each layer is page aligned. For > example, when pmem dirver handles ->memory_failure(), it should subtract its > ->data_offset when it calls dax_holder_notify_failure(). If they aren't, none of the DAX machinery would work.
Re: [PATCH v11 1/8] dax: Introduce holder for dax_device
在 2022/3/30 13:41, Christoph Hellwig 写道: On Wed, Mar 16, 2022 at 09:46:07PM +0800, Shiyang Ruan wrote: Forgive me if this has been discussed before, but since dax_operations are in terms of pgoff and nr pages and memory_failure() is in terms of pfns what was the rationale for making the function signature byte based? Maybe I didn't describe it clearly... The @offset and @len here are byte-based. And so is ->memory_failure(). Yes, but is there a good reason for that when the rest of the DAX code tends to work in page chunks? Because I am not sure if the offset between each layer is page aligned. For example, when pmem dirver handles ->memory_failure(), it should subtract its ->data_offset when it calls dax_holder_notify_failure(). The implementation of ->memory_failure() by pmem driver: +static int pmem_pagemap_memory_failure(struct dev_pagemap *pgmap, + phys_addr_t addr, u64 len, int mf_flags) +{ + struct pmem_device *pmem = + container_of(pgmap, struct pmem_device, pgmap); + u64 offset = addr - pmem->phys_addr - pmem->data_offset; + + return dax_holder_notify_failure(pmem->dax_dev, offset, len, mf_flags); +} So, I choose u64 as the type of @len. And for consistency, the @addr is using byte-based type as well. > memory_failure() > |* fsdax case > | > |pgmap->ops->memory_failure() => pmem_pgmap_memory_failure() > | dax_holder_notify_failure() => the offset from 'pmem driver' to 'dax holder' > | dax_device->holder_ops->notify_failure() => > | - xfs_dax_notify_failure() > | |* xfs_dax_notify_failure() > | |-- > | | xfs_rmap_query_range() > | |xfs_dax_failure_fn() > | |* corrupted on metadata > | | try to recover data, call xfs_force_shutdown() > | |* corrupted on file data > | | try to recover data, call mf_dax_kill_procs() > |* normal case > |- > |mf_generic_kill_procs() -- Thanks, Ruan.
Re: [PATCH v6 3/6] mm: rmap: introduce pfn_mkclean_range() to cleans PTEs
On Wed, Mar 30, 2022 at 03:31:37PM +0800, Muchun Song wrote: > I saw Shiyang is ready to rebase onto this patch. So should I > move it to linux/mm.h or let Shiyang does? Good question. I think Andrew has this series in -mm and ready to go to Linus, so maybe it is best if we don't change too much. Andrew, can you just fold in the trivial comment fix?
Re: [PATCH v6 3/6] mm: rmap: introduce pfn_mkclean_range() to cleans PTEs
On Wed, Mar 30, 2022 at 1:47 PM Christoph Hellwig wrote: > > On Tue, Mar 29, 2022 at 09:48:50PM +0800, Muchun Song wrote: > > + * * Return the start of user virtual address at the specific offset within > > Double "*" here. Thanks for pointing out this. > > Also Shiyang has been wanting a quite similar vma_pgoff_address for use > in dax.c. Maybe we'll need to look into moving this to linux/mm.h. > I saw Shiyang is ready to rebase onto this patch. So should I move it to linux/mm.h or let Shiyang does? Thanks.