On Fri, Jul 04, 2025 at 10:43:57AM +0800, Yuwen Chen wrote: > During path traversal, the generic_ci_match function may be called > multiple times. The number of memory allocations and releases > in it accounts for a relatively high proportion in the flamegraph. > This patch significantly reduces the number of memory allocations > in generic_ci_match through pre - allocation. > > Signed-off-by: Yuwen Chen <ywen.c...@foxmail.com> > --- > fs/ext4/namei.c | 2 +- > fs/f2fs/dir.c | 2 +- > fs/libfs.c | 33 ++++++++++++++++++++++++++++++--- > include/linux/fs.h | 8 +++++++- > 4 files changed, 39 insertions(+), 6 deletions(-) >
The reason the allocation is needed at all is because generic_ci_match() has to decrypt the encrypted on-disk filename from the dentry that it's matching against. It can't decrypt in-place, since the source buffer is in the pagecache which must not be modified. Hence, a separate destination buffer is needed. Filenames have a maximum length of NAME_MAX, i.e. 255, bytes. It would be *much* simpler to just allocate that on the stack. And we almost can. 255 bytes is on the high end of what can be acceptable to allocate on the stack in the kernel. However, here it would give a lot of benefit and would always occur close to the leaves in the call graph. So the size is not a barrier here, IMO. The real problem is, once again, the legacy crypto_skcipher API, which requires that the source/destination buffers be provided as scatterlists. In Linux, the kernel stack can be in the vmalloc area. Thus, the buffers passed to crypto_skcipher cannot be stack buffers unless the caller actually is aware of how to turn a vmalloc'ed buffer into a scatterlist, which is hard to do. (See verity_ahash_update() in drivers/md/dm-verity-target.c for an example.) Fortunately, I'm currently in the process of introducing library APIs that will supersede these legacy crypto APIs. They'll be simpler and faster and won't have these silly limitations like not working on virtual addresses... I plan to make fscrypt use the library APIs instead of the legacy crypto API. It will take some time to land everything, though. We can consider this patchset as a workaround in the mean time. But it's sad to see the legacy crypto API continue to cause problems and more time be wasted on these problems. I do wonder if the "turn a vmalloc'ed buffer into a scatterlist" trick that some code in the kernel uses is something that would be worth adopting for now in fname_decrypt(). As I mentioned above, it's hard to do (you have to go page by page), but it's possible. That would allow immediately moving generic_ci_match() to use a stack allocation, which would avoid adding all the complexity of the preallocation that you have in this patchset. - Eric _______________________________________________ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel