> 
> > When build_cpu_topo() encounters offline/absent CPUs,
> > it fails to find any sysfs entries and returns failure.
> > This leads to build_cpu_topology() and write_cpu_topology()
> > failing as well.
> > 
> > Because HEADER_CPU_TOPOLOGY has not been written, read leaves
> > cpu_topology_map NULL and we get NULL ptr deref at:
> > 
> >  ...
> >   cmd_test
> >    __cmd_test
> >     test_and_print
> >      run_test
> >       test_session_topology
> >        check_cpu_topology
> 
> So IIUIC that's the key issue here.. write_cpu_topology that fails
> to write the TOPO data and following readers crashing on processing
> uncomplete data? if thats the case write_cpu_topology needs to
> be fixed, instead of doing workarounds

It's already late when you are in write_cpu_topology(), because
build_cpu_topology() returned you NULL - there's nothing to write.
That's why patch aims to fix this in build_cpu_topology().

> 
> SNIP
> 
> >     u32 nr, i;
> >     size_t sz;
> >     long ncpus;
> > -   int ret = -1;
> > +   int ret = 0;
> > +   struct cpu_map *map;
> >  
> >     ncpus = sysconf(_SC_NPROCESSORS_CONF);
> >     if (ncpus < 0)
> > -           return NULL;
> > +           goto out;
> 
> can just return NULL
> 
> > +
> > +   /* build online CPU map */
> > +   map = cpu_map__new(NULL);
> > +   if (map == NULL) {
> > +           pr_debug("failed to get system cpumap\n");
> > +           goto out;
> > +   }
> >  
> >     nr = (u32)(ncpus & UINT_MAX);
> >  
> >     sz = nr * sizeof(char *);
> > -
> >     addr = calloc(1, sizeof(*tp) + 2 * sz);
> >     if (!addr)
> > -           return NULL;
> > +           goto out_free;
> >  
> >     tp = addr;
> >     tp->cpu_nr = nr;
> > @@ -530,14 +537,21 @@ static struct cpu_topo *build_cpu_topology(void)
> >     tp->thread_siblings = addr;
> >  
> >     for (i = 0; i < nr; i++) {
> > +           if (!cpu_map__has(map, i))
> > +                   continue;
> > +
> 
> so this prevents build_cpu_topo to fail due to missing topology
> info because cpu is offline.. can it fail for other reasons?

It's unlikely, though I suppose if you couldn't open and read something
from sysfs (say sysfs is not mounted) it can fail for online CPU too.

> 
> 
> >             ret = build_cpu_topo(tp, i);
> >             if (ret < 0)
> >                     break;
> 

SNIP

> For example:
>   _SC_NPROCESSORS_CONF == 16
>   available: 2 nodes (0-1)
>   node 0 cpus: 0 6 8 10 16 22 24 26
>   node 0 size: 12004 MB
>   node 0 free: 9470 MB
>   node 1 cpus: 1 7 9 11 23 25 27
>   node 1 size: 12093 MB
>   node 1 free: 9406 MB
>   node distances:
>   node   0   1
>     0:  10  20
>     1:  20  10
> so what's max_present_cpu in this example?

It's 28, which is the number of core_id/socket_id entries,
for CPUs 0 up to 27.

Regards,
Jan

Reply via email to