MuteBardTison opened a new pull request, #46034: URL: https://github.com/apache/arrow/pull/46034
### Rationale for this change Arrow’s current CPU thread count detection uses `std::thread::hardware_concurrency()` which does not take into account the process-level CPU affinity mask (e.g., set via `taskset`). This can lead to thread oversubscription and performance issues when Arrow runs in constrained environments. This PR updates the internal `CpuInfo` logic to use `sched_getaffinity()` on Linux, ensuring Arrow respects the number of cores actually available to the process. ### What changes are included in this PR? - Added `affinity.h` to expose `GetAffinityCpuCount()` on Linux - Updated `CpuInfo::Impl` in `cpu_info.cc` to use `GetAffinityCpuCount()` instead of raw `std::thread::hardware_concurrency()` - Added a new unit test in `cpu_info_test.cc` to validate this behavior against `sched_getaffinity()` on Linux - Used `#ifdef __linux__` to ensure cross-platform compatibility ### Are these changes tested? Yes ✅ A Linux-only unit test (`CpuInfoTest.CpuAffinity`) compares the result of `CpuInfo::num_cores()` with the actual CPU affinity mask from `sched_getaffinity()`. ### Are there any user-facing changes? No changes to public APIs. Behavioral changes are limited to internal CPU thread detection logic on Linux. <!-- Remove the following sections if not applicable --> <!-- No breaking API changes --> <!-- No critical security or crash fix --> --- Original issue: https://github.com/apache/arrow/issues/45860 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
