Laurent Dufour <lduf...@linux.ibm.com> writes: > Le 02/04/2021 à 15:34, Nathan Lynch a écrit : >> Laurent Dufour <lduf...@linux.ibm.com> writes: >>> When a CPU is hot added, the CPU ids are taken from the available mask from >>> the lower possible set. If that set of values was previously used for CPU >>> attached to a different node, this seems to application like if these CPUs >>> have migrated from a node to another one which is not expected in real >>> life. >> >> This seems like a problem that could affect other architectures or >> platforms? I guess as long as arch code is responsible for placing new >> CPUs in cpu_present_mask, that code will have the responsibility of >> ensuring CPU IDs' NUMA assignments remain stable. > > Actually, x86 is already handling this issue in the arch code specific > code, see 8f54969dc8d6 ("x86/acpi: Introduce persistent storage for > cpuid <-> apicid mapping"). I didn't check for other architectures but > as CPU id allocation is in the arch part, I believe this is up to each > arch to deal with this issue. > > Making the CPU id allocation common to all arch is outside the scope > of this patch.
Well... we'd better avoid a situation where architectures impose different policies in this area. (I guess we're already in this situation, which is the problem.) A more maintainable way to achieve that would be to put the higher-level policy in arch-independent code, as much as possible. I don't insist, though. >> I don't know, should we not fail the request instead of doing the >> ABI-breaking thing the code in this change is trying to prevent? I >> don't think a warning in the kernel log is going to help any >> application that would be affected by this. > > That's a really good question. One should argue that the most > important is to satisfy the CPU add operation, assuming that only few > are interested in the CPU numbering, while others would prefer the CPU > adding to fail (which may prevent adding CPUs on another nodes if the > whole operation is aborted as soon as a CPU add is failing). > > I was conservative here, but if failing the operation is the best > option, then this will make that code simpler, removing the backward > jump. > > Who is deciding? I favor failing the request. Apart from the implications for user space, it's not clear to me that allowing the cpu-node relationship to change once initialized is benign in terms of internal kernel assumptions (e.g. sched domains, workqueues?). And as you say, it would make for more straightforward code.