dfanache opened a new pull request, #18868:
URL: https://github.com/apache/nuttx/pull/18868
## Summary
Currently, three top-level assignments in `tools/Config.mk` use the form
`export VAR ?= $(... ${shell ...} ...)` which produces a recursive/lazy
variable, slowing down the build.
The right-hand side is re-expanded - and the embedded `${shell ...}` reruns
every time the variable is used. Because these variables are also `export`ed,
make expands them once per recipe's environment, spawning the `tools/incdir` /
`tools/define` host helpers hundreds of times over a full build.
This PR replaces the three lines with:
```
ifeq ($(origin VAR),undefined)
VAR := $(... ${shell ...} ...)
endif
export VAR
```
Here, `:=` is simply-expanded, so the shell runs just once at parse time.
The `ifeq ...` wrapper is meant to preserve the override semantics of `?=`
(passing `DEFINE_PREFIX=...` on the make command line or as an env variable
cancels the assignment).
## Impact
**Developers and build systems**: noticeably faster full builds on
multi-core hosts; no behaviour change - the override semantics of `?=` are
preserved.
Measured impact on a 20-core build host is a ~26% speedup of wall time,
while testing a number of standard boards.
## Testing
Host: Linux, 20-threads x86_64, arm-none-eabi-gcc 15.2.1, GNU make 4.4.1.
Methodology: three full clean builds per side (patched/unpatched), each
preceded by `make distclean` and a fresh `configure.sh + olddefconfig`. The
patched and unpatched runs were taken back to back on the same host, with the
same `-j20` invocation. The times reported in seconds are medians over the
three runs; variance was always under 0.2s.
### Config 1: `raspberrypi-pico-2:nsh` (rp23xx, Cortex-M33)
Extra apps enabled to increase the number of recipes: `TESTING_OSTEST`,
`SYSTEM_SYSTEM_TOP`, `TESTING_GETPRIME`, `BENCHMARKS_COREMARK`,
`BENCHMARKS_DHRYSTONE`, `LIBC_FLOATINGPOINT`.
| | Baseline | Patched | Change |
|---|---|---|---|
| Wall | 9.67s | 7.15s | **-2.52s, -26%** |
| User | 75.48s | 73.51s | -1.97s |
| Sys | 25.09s | 23.36s | -1.73s |
The elf was flashed to a Raspberry Pi Pico 2 W; ostest ran to completion
with exit status 0.
### Config 2: `imxrt1064-evk:netnsh` (iMX RT 1064, Cortex-M7)
Default config, no app overrides. No hardware test; build success only.
| | Baseline | Patched | Change |
|---|---|---|---|
| Wall | 12.37s | 9.01s | **-3.36 s, -27%** |
| User | 106.57s | 95.38s | -11.19s |
| Sys | 35.14s | 30.16s | -4.98s |
## Larger scale impact and fix origins
This optimization came about by trying to speedup the build of a
RP2350-based custom PX4-Autopilot board target, where the build time drops from
126.01s to 14.69s wall time (88% speedup, 8.6x quicker).
In PX4, NuttX is built through CMake and multiple isolated sub-makes are
spawned (each NuttX library is a separate `add_custom_command` invoking `make
-C <libdir>` and the wrappers reset `MAKELEVEL=0` probably to avoid jobserver
collision). So here we can see a proportionally larger speed increase, because
the bug fires per-sub-make rather than once.
I am not including timing logs of the PX4 builds as evidence for now, just
as context on how the cost of the existing inefficiency increases with build
complexity from other projects including the OS.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]