> > > When build_cpu_topo() encounters offline/absent CPUs, > > it fails to find any sysfs entries and returns failure. > > This leads to build_cpu_topology() and write_cpu_topology() > > failing as well. > > > > Because HEADER_CPU_TOPOLOGY has not been written, read leaves > > cpu_topology_map NULL and we get NULL ptr deref at: > > > > ... > > cmd_test > > __cmd_test > > test_and_print > > run_test > > test_session_topology > > check_cpu_topology > > So IIUIC that's the key issue here.. write_cpu_topology that fails > to write the TOPO data and following readers crashing on processing > uncomplete data? if thats the case write_cpu_topology needs to > be fixed, instead of doing workarounds
It's already late when you are in write_cpu_topology(), because build_cpu_topology() returned you NULL - there's nothing to write. That's why patch aims to fix this in build_cpu_topology(). > > SNIP > > > u32 nr, i; > > size_t sz; > > long ncpus; > > - int ret = -1; > > + int ret = 0; > > + struct cpu_map *map; > > > > ncpus = sysconf(_SC_NPROCESSORS_CONF); > > if (ncpus < 0) > > - return NULL; > > + goto out; > > can just return NULL > > > + > > + /* build online CPU map */ > > + map = cpu_map__new(NULL); > > + if (map == NULL) { > > + pr_debug("failed to get system cpumap\n"); > > + goto out; > > + } > > > > nr = (u32)(ncpus & UINT_MAX); > > > > sz = nr * sizeof(char *); > > - > > addr = calloc(1, sizeof(*tp) + 2 * sz); > > if (!addr) > > - return NULL; > > + goto out_free; > > > > tp = addr; > > tp->cpu_nr = nr; > > @@ -530,14 +537,21 @@ static struct cpu_topo *build_cpu_topology(void) > > tp->thread_siblings = addr; > > > > for (i = 0; i < nr; i++) { > > + if (!cpu_map__has(map, i)) > > + continue; > > + > > so this prevents build_cpu_topo to fail due to missing topology > info because cpu is offline.. can it fail for other reasons? It's unlikely, though I suppose if you couldn't open and read something from sysfs (say sysfs is not mounted) it can fail for online CPU too. > > > > ret = build_cpu_topo(tp, i); > > if (ret < 0) > > break; > SNIP > For example: > _SC_NPROCESSORS_CONF == 16 > available: 2 nodes (0-1) > node 0 cpus: 0 6 8 10 16 22 24 26 > node 0 size: 12004 MB > node 0 free: 9470 MB > node 1 cpus: 1 7 9 11 23 25 27 > node 1 size: 12093 MB > node 1 free: 9406 MB > node distances: > node 0 1 > 0: 10 20 > 1: 20 10 > so what's max_present_cpu in this example? It's 28, which is the number of core_id/socket_id entries, for CPUs 0 up to 27. Regards, Jan