teemperor added a comment.

In D96033#2768940 <https://reviews.llvm.org/D96033#2768940>, @phosek wrote:

> In D96033#2767884 <https://reviews.llvm.org/D96033#2767884>, @teemperor wrote:
>
>> In D96033#2766502 <https://reviews.llvm.org/D96033#2766502>, @phosek wrote:
>>
>>> In D96033#2766372 <https://reviews.llvm.org/D96033#2766372>, @v.g.vassilev 
>>> wrote:
>>>
>>>> In D96033#2766332 <https://reviews.llvm.org/D96033#2766332>, @phosek wrote:
>>>>
>>>>> We've started seeing `LLVM ERROR: out of memory` on our 2-stage LTO Linux 
>>>>> builders after this change landed. It looks like linking `clang-repl` 
>>>>> always fails on our bot, but I've also seen OOM when linking 
>>>>> `ClangCodeGenTests` and `FrontendTests`. Do you have any idea why this 
>>>>> could be happening? We'd appreciate any help since our bots have been 
>>>>> broken for several days now.
>>>>
>>>> Ouch. Are the bot logs public? If not maybe a stacktrace could be useful. 
>>>> `clang-repl` combines a lot of libraries across llvm and clang that 
>>>> usually are compiled separately. For instance we put in memory most of the 
>>>> clang frontend, the backend and the JIT. Could it be we are hitting some 
>>>> real limit?
>>>
>>> Yes, they are, see 
>>> https://luci-milo.appspot.com/p/fuchsia/builders/prod/clang-linux-x64, but 
>>> there isn't much information in there unfortunately. It's possible that 
>>> we're hitting some limit, but these bots use 32-core instances with 128GB 
>>> RAM which I'd hope is enough even for the LTO build.
>>
>> I think the specs are fine for just building with LTO, but I am not sure if 
>> that's enough to for the worst case when running `ninja -j320` with an LTO 
>> build (which is what your job is doing). Can you try limiting your link jobs 
>> to something like 16 or 32 (e.g., `-DLLVM_PARALLEL_LINK_JOBS=32`)
>>
>> (FWIW, your go build script also crashes with OOM errors so you really are 
>> running low on memory on that node)`
>
> `-j320` is only used for the first stage compiler which uses distributed 
> compilation and no LTO, the second stage which uses LTO and where we see this 
> issue uses Ninja default, so `-j32` in this case.

I admit I don't really know the CI system on your node, but I assumed you're 
using `-j320` from this output which I got by clicking on "execution details" 
on the aborted stage of this build 
<https://luci-milo.appspot.com/ui/p/fuchsia/builders/prod/clang-linux-x64/b8846868883354028928/overview>:

  Executing command [
    '/b/s/w/ir/x/w/cipd/ninja',
    '-j320',
    'stage2-check-clang',
    'stage2-check-lld',
    'stage2-check-llvm',
    'stage2-check-polly',
  ]
  escaped for shell: /b/s/w/ir/x/w/cipd/ninja -j320 stage2-check-clang 
stage2-check-lld stage2-check-llvm stage2-check-polly
  in dir /b/s/w/ir/x/w/staging/llvm_build
  at time 2021-05-18T20:53:37.215574


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D96033/new/

https://reviews.llvm.org/D96033

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to