Konstantin Ryabitsev <konstan...@linuxfoundation.org> wrote: > Hello: > > Is there any specific logic for mixing --batch-size and --jobs? On a system > with plenty of CPUs and lots of RAM, does it make sense to have more --jobs, > larger --batch-size, or some balance of both?
jobs will be bound by I/O capability for your case. SATA-2 vs SATA-3 vs NVME will have a notable difference, as does the quality of the device (MLC, TLC, QLC; cache/controller). Xapian seems to do better with bigger batch-sizes up to a point. I'm not sure I have enough RAM to accurately test >8m batch sizes (since we also need to account for kernel caching). batch-size * (jobs - 1) = rough total batch size If it's the initial index creation, I would definitely use --no-fsync, too. Perhaps that should be the default for new indices. Also note: the recent RFC for --sequential-commit doesn't seem to be working out performance-wise on my SATA-2 system; but I'm also not sure about SSD life/degradation. -- unsubscribe: one-click, see List-Unsubscribe header archive: https://public-inbox.org/meta/