Re: llvm on aarch64 builds very slowly

Kaelyn Sat, 26 Feb 2022 08:54:16 -0800

On Wednesday, February 23rd, 2022 at 9:49 AM, Christopher Baines 
<m...@cbaines.net> wrote:


> Ricardo Wurmus rek...@elephly.net writes:
>
> > Ricardo Wurmus rek...@elephly.net writes:
> >
> > > Hi Guix,
> > >
> > > I had to manually run the build of llvm 11 on aarch64, because it would
> > >
> > > keep timing out:
> > >
> > > time guix build 
> > > /gnu/store/0hc7inxqcczb8mq2wcwrcw0vd3i2agkv-llvm-11.0.0.drv 
> > > --timeout=999999 --max-silent-time=999999
> > >
> > > After more than two days it finally built. This seems a little
> > >
> > > excessive. Towards the end of the build I saw a 1% point progress
> > >
> > > increase for every hour that passed.
> > >
> > > Is there something wrong with the build nodes, are we building llvm 11
> > >
> > > wrong, or is this just the way it is on aarch64 systems?
> >
> > I now see that gfortran 10 also takes a very long time to build. It’s
> >
> > on kreuzberg (10.0.0.9) and I see that out of the 16 cores only one is
> >
> > really busy. Other cores sometimes come in with a tiny bit of work, but
> >
> > you might miss it if you blink.
> >
> > Guix ran “make -j 16” at the top level, but the other make processes
> >
> > that have been spawned as children do not have “-j 16”. There are
> >
> > probably 16 or so invocations of cc1plus, but only CPU0 seems to be busy
> >
> > at 100% while the others are at 0.
> >
> > What’s up with that?
>
> Regarding the llvm derivation you mentioned [1], it looks like for
>
> bordeaux.guix.gnu.org, the build completed in around a couple of hours,
>
> this was on the 4 core Overdrive machine though.
>
> 1: 
> https://data.guix.gnu.org/gnu/store/0hc7inxqcczb8mq2wcwrcw0vd3i2agkv-llvm-11.0.0.drv
>
> On the subject of the HoneyComb machines, I haven't noticed anything
>
> like you describe with the one (hatysa) running behind
>
> bordeaux.guix.gnu.org. Most cores are fully occupied most of the time,
>
> which the 15m load average sitting around 16.
>
> Some things to check though, what does the load average look like when
>
> you think the system should be using all it's cores? If it's high but
>
> there's not much CPU utilisation, that suggests there's a bottleneck
>
> somewhere else.
>
> Also, what does the memory and swap usage look like? Hatysa has 32GB of
>
> memory and swap, and ideally it would actually have 64GB, since that
>
> would avoid swapping more often.

One thing I remember about building LLVM a number of years ago when I was 
working on it through my job (though only for x86-64, not aarch64) is that the 
build is very memory intensive. In particular, linking the various binaries 
would each be quite slow and consume a lot of memory, causing significant, 
intense swapping with less than 64GB of memory in a parallel build (and 
sometimes eventually trigger the OOM killer). As I recall, using ld.bfd for the 
build was by far the slowest, ld.gold was noticeably better, and ld.lld was 
showing promise for doing better than ld.gold. Just my $0.02 of past 
experiences, in case they help to understand the slow aarch64 build with LLVM 
11.

Cheers,
Kaelyn

>
> One problem I have observed with hatysa is storage
>
> instability/performance issues. Looking in /var/log/messages, I see
>
> things like the following. Maybe check /var/log/messages for anything
>
> similar?
>
> nvme nvme0: I/O 0 QID 6 timeout, aborting
>
> nvme nvme0: I/O 1 QID 6 timeout, aborting
>
> nvme nvme0: I/O 2 QID 6 timeout, aborting
>
> nvme nvme0: I/O 3 QID 6 timeout, aborting
>
> nvme nvme0: Abort status: 0x0
>
> nvme nvme0: Abort status: 0x0
>
> nvme nvme0: Abort status: 0x0
>
> nvme nvme0: Abort status: 0x0
>
> Lastly, I'm not quite sure what thermal problems look like on ARM, but
>
> maybe check the CPU temps. I see between 60 and 70 degrees as reported
>
> by the sensors command, this is with a different CPU cooler though.
>
> Chris

Re: llvm on aarch64 builds very slowly

Reply via email to