Am Mon, 25 Sep 2017 21:35:02 +1000 schrieb Damo Brisbane <dhatche...@gmail.com>:
> Can someone point where I might go for parallel @world build, it is > really for my own curiositynat this time. Currently I stage binaries > for multiple machines on a single nfs share, but the assumption is to > use instead some distributed filesystem. So I think I just need a > recipie, pointers or ideas on how to distribute emerge on an @world > set? I am thinking granular first, ie per package rather than eg > distributed gcc within a single package. As others already pointed out, distcc introduces more headache then it solves. If you are searching for a solution due to performance of package building, you get most profit from building on tmpfs. Then, I also suggest going breadth first, thus building more packages at the same time. Your question implies depth first which means having more compiler processes running at a time for a single package. But most build processes do not scale out very well for the following reasons: 1. Configure phases are serial processes 2. Dependencies in Makefile are often buggy or incomplete 3. Dependencies between source files often allow parallel building only for short burst throughout the complete build and are serial otherwise Building packages in parallel instead solves all these problems: Each build phase can one in parallel to every other build phase. So while a serialized configure phase is running or package is bundled/merged, another package can have multiple gccs running while a third package maybe builds serialized due to source file deps. Also, emerge is very IO bound. Resorting to distcc won't solve this, as a lot of compiler internals need to be copied back and forth between the peers. It may even create more IO than building locally only. Using tmpfs instead solves this much better. I'm using the following settings and have 100% on all eight cores almost all the time during emerge, while IO is idle most of the time: MAKEOPTS="-s -j9 -l8" FEATURES="sfperms parallel-fetch parallel-install protect-owned \ userfetch splitdebug fail-clean cgroup compressdebug buildpkg \ binpkg-multi-instance clean-logs userpriv usersandbox" EMERGE_DEFAULT_OPTS="--binpkg-respect-use=y --binpkg-changed-deps=y \ --jobs=10 --load-average 8 --keep-going --usepkg" $ fgrep portage /etc/fstab none /var/tmp/portage tmpfs noauto,x-systemd.automount,x-systemd.idle-timeout=60,size=32G,mode=770,uid=portage,gid=portage Have either enough swap or lower the tmpfs allocation. Using FEATURES buildpkg pinpkg-multi-instance allows to reuse packages on different but similar machines. EMERGE_DEFAULT_OPTS makes use of this. /usr/portage/{distfiles,packages} is on shared media. Also, I'm usually building world upgrades with --changed-deps to rebuild dependers and update the bin packages that way. I'm not sure, tho, if running emerge in parallel on two machines would pickup newly appearing binpkgs during the process... I guess, not. I usually don't do that except the dep tree looks independent between both machines. If your machine cannot saturate the CPU throughout the whole emerge process (as long as there are parallel ebuild running), then distcc will clearly not help you, make the complete process slower due to waiting on remote resources, and even increase the load. Only very few, huge projects, with Makefile deps very clearly optimized or specially crafted for distributed builds can benefit from distcc. Most projects aren't of this type, even Chromium and LibreOffice don't. Exactly, those projects have way to much meta data to transport between the distcc peers. But YMMV. I'd say, try a different path first. -- Regards, Kai Replies to list-only preferred.