teemperor added a comment. In D96033#2768940 <https://reviews.llvm.org/D96033#2768940>, @phosek wrote:
> In D96033#2767884 <https://reviews.llvm.org/D96033#2767884>, @teemperor wrote: > >> In D96033#2766502 <https://reviews.llvm.org/D96033#2766502>, @phosek wrote: >> >>> In D96033#2766372 <https://reviews.llvm.org/D96033#2766372>, @v.g.vassilev >>> wrote: >>> >>>> In D96033#2766332 <https://reviews.llvm.org/D96033#2766332>, @phosek wrote: >>>> >>>>> We've started seeing `LLVM ERROR: out of memory` on our 2-stage LTO Linux >>>>> builders after this change landed. It looks like linking `clang-repl` >>>>> always fails on our bot, but I've also seen OOM when linking >>>>> `ClangCodeGenTests` and `FrontendTests`. Do you have any idea why this >>>>> could be happening? We'd appreciate any help since our bots have been >>>>> broken for several days now. >>>> >>>> Ouch. Are the bot logs public? If not maybe a stacktrace could be useful. >>>> `clang-repl` combines a lot of libraries across llvm and clang that >>>> usually are compiled separately. For instance we put in memory most of the >>>> clang frontend, the backend and the JIT. Could it be we are hitting some >>>> real limit? >>> >>> Yes, they are, see >>> https://luci-milo.appspot.com/p/fuchsia/builders/prod/clang-linux-x64, but >>> there isn't much information in there unfortunately. It's possible that >>> we're hitting some limit, but these bots use 32-core instances with 128GB >>> RAM which I'd hope is enough even for the LTO build. >> >> I think the specs are fine for just building with LTO, but I am not sure if >> that's enough to for the worst case when running `ninja -j320` with an LTO >> build (which is what your job is doing). Can you try limiting your link jobs >> to something like 16 or 32 (e.g., `-DLLVM_PARALLEL_LINK_JOBS=32`) >> >> (FWIW, your go build script also crashes with OOM errors so you really are >> running low on memory on that node)` > > `-j320` is only used for the first stage compiler which uses distributed > compilation and no LTO, the second stage which uses LTO and where we see this > issue uses Ninja default, so `-j32` in this case. I admit I don't really know the CI system on your node, but I assumed you're using `-j320` from this output which I got by clicking on "execution details" on the aborted stage of this build <https://luci-milo.appspot.com/ui/p/fuchsia/builders/prod/clang-linux-x64/b8846868883354028928/overview>: Executing command [ '/b/s/w/ir/x/w/cipd/ninja', '-j320', 'stage2-check-clang', 'stage2-check-lld', 'stage2-check-llvm', 'stage2-check-polly', ] escaped for shell: /b/s/w/ir/x/w/cipd/ninja -j320 stage2-check-clang stage2-check-lld stage2-check-llvm stage2-check-polly in dir /b/s/w/ir/x/w/staging/llvm_build at time 2021-05-18T20:53:37.215574 Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D96033/new/ https://reviews.llvm.org/D96033 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits