[gentoo-user] Re: distributed emerge

Kai Krakow Tue, 26 Sep 2017 17:05:34 -0700

Am Mon, 25 Sep 2017 21:35:02 +1000
schrieb Damo Brisbane <dhatche...@gmail.com>:

> Can someone point where I might go for parallel @world build, it is
> really for my own curiositynat this time. Currently I stage binaries
> for multiple machines on a single nfs share, but the assumption is to
> use instead some distributed filesystem. So I think I just need a
> recipie, pointers or ideas on how to distribute emerge on an @world
> set? I am thinking granular first, ie per package rather than eg
> distributed gcc within a single package.

As others already pointed out, distcc introduces more headache then it
solves.

If you are searching for a solution due to performance of package
building, you get most profit from building on tmpfs.

Then, I also suggest going breadth first, thus building more packages
at the same time.

Your question implies depth first which means having more compiler
processes running at a time for a single package. But most build
processes do not scale out very well for the following reasons:

1. Configure phases are serial processes

2. Dependencies in Makefile are often buggy or incomplete

3. Dependencies between source files often allow parallel
building only for short burst throughout the complete
build and are serial otherwise

Building packages in parallel instead solves all these problems: Each
build phase can one in parallel to every other build phase. So while a
serialized configure phase is running or package is bundled/merged,
another package can have multiple gccs running while a third package
maybe builds serialized due to source file deps.

Also, emerge is very IO bound. Resorting to distcc won't solve this, as
a lot of compiler internals need to be copied back and forth between
the peers. It may even create more IO than building locally only. Using
tmpfs instead solves this much better.

I'm using the following settings and have 100% on all eight cores
almost all the time during emerge, while IO is idle most of the time:

MAKEOPTS="-s -j9 -l8"
FEATURES="sfperms parallel-fetch parallel-install protect-owned \
userfetch splitdebug fail-clean cgroup compressdebug buildpkg \
binpkg-multi-instance clean-logs userpriv usersandbox"
EMERGE_DEFAULT_OPTS="--binpkg-respect-use=y --binpkg-changed-deps=y \
--jobs=10 --load-average 8 --keep-going --usepkg"

$ fgrep portage /etc/fstab
none /var/tmp/portage tmpfs
noauto,x-systemd.automount,x-systemd.idle-timeout=60,size=32G,mode=770,uid=portage,gid=portage

Have either enough swap or lower the tmpfs allocation.

Using FEATURES buildpkg pinpkg-multi-instance allows to reuse packages
on different but similar machines. EMERGE_DEFAULT_OPTS makes use of
this. /usr/portage/{distfiles,packages} is on shared media.

Also, I'm usually building world upgrades with --changed-deps to
rebuild dependers and update the bin packages that way.

I'm not sure, tho, if running emerge in parallel on two machines would
pickup newly appearing binpkgs during the process... I guess, not. I
usually don't do that except the dep tree looks independent between
both machines.

If your machine cannot saturate the CPU throughout the whole emerge
process (as long as there are parallel ebuild running), then distcc
will clearly not help you, make the complete process slower due to
waiting on remote resources, and even increase the load. Only very few,
huge projects, with Makefile deps very clearly optimized or specially
crafted for distributed builds can benefit from distcc. Most projects
aren't of this type, even Chromium and LibreOffice don't. Exactly,
those projects have way to much meta data to transport between the
distcc peers.

But YMMV. I'd say, try a different path first.

--
Regards,
Kai

Replies to list-only preferred.

[gentoo-user] Re: distributed emerge

Reply via email to