For very large systems with hundreds of CPUs and TBs of RAM booting can take a very long time.
Initial reports showed that booting a configuration of several hundred CPUs and 64TB of RAM would take more than 30 minutes and require kernel parameters of udev.children-max=1024 systemd.default_timeout_start_sec=3600 to prevent dropping into emergency mode. Gathering information about what's happening during the boot is a bit challenging. But two main issues appeared to be, a large number of path lookups for non-existent files, and high lock contention in the VFS during path walks particularly in the dentry allocation code path. The underlying cause of this was believed to be the sheer number of sysfs memory objects, 100,000+ for a 64TB memory configuration. This patch series tries to reduce the locking needed during path walks based on the assumption that there are many path walks with a fairly large portion of those for non-existent paths. This was done by adding kernfs negative dentry caching (non-existent paths) to avoid continual alloc/free cycle of dentries and a read/write semaphore introduced to increase kernfs concurrency during path walks. With these changes the kernel parameters of udev.children-max=2048 and systemd.default_timeout_start_sec=300 for are still needed to get the fastest boot times and result in boot time of under 5 minutes. There may be opportunities for further improvements but the series here has seen a fair amount of testing. And thinking about what else could be done, and discussing it with Rick Lindsay, I suspect improvements will get more difficult to implement for somewhat less improvement so I think what we have here is a good start for now. I think what's needed now is patch review, and if we can get through that, send them via linux-next for broader exposure and hopefully have them merged into mainline. --- Ian Kent (4): kernfs: switch kernfs to use an rwsem kernfs: move revalidate to be near lookup kernfs: improve kernfs path resolution kernfs: use revision to identify directory node changes fs/kernfs/dir.c | 283 ++++++++++++++++++++++++++++--------------- fs/kernfs/file.c | 4 - fs/kernfs/inode.c | 16 +- fs/kernfs/kernfs-internal.h | 29 ++++ fs/kernfs/mount.c | 12 +- fs/kernfs/symlink.c | 4 - include/linux/kernfs.h | 5 + 7 files changed, 232 insertions(+), 121 deletions(-) -- Ian