Hi Brice, thanks for looking into this!
Package ID seems to be properly set on POWER8/9 machines I have access to. > Maybe something related to LPAR exposing a special/virtual topology hence > die and package ID wouldn't make sense? Yes, it sounds plausible. I will try to find out. os_index is already *unsigned* in the API (did you mean signed?) My bad. My C code using hwloc API was printing -1. It's because I store "obj[2]->os_index" as signed integer, so it gets converted back to -1. We cannot change the obj->os_index back to signed now, it would break > existing users. Sure, I understand that. However maybe debugging would be easier if tools printed that special > value as -1 instead of 4294967295 (I'd need to check other tools too, > lstopo takes care of some of these values, maybe not all). I agree. So perhaps we can update to tools only, to print 4294967295 as -1? Thanks! Jirka On Wed, May 26, 2021 at 2:14 PM Brice Goglin <brice.gog...@inria.fr> wrote: > Le 26/05/2021 à 13:51, Jirka Hladky a écrit : > > Hi Brice, > > I would like to get your opinion on the following issue. On IBM LPAR, > kernel reports die_id and physical_package_id to be -1. See [0] > > hwloc-calc converts these values into an unsigned integer, resulting in > Socket ID 2^32-1: > > hwloc-calc --physical-output --intersect socket core:0 > 4294967295 > > I'm not quite sure why are die_id and physical_package_id set to -1. > Perhaps it signalizes some error condition. > > > Hello Jirka > > die_id might be only implemented on x86 since it was the only architecture > that could expose different dies within packages when die topology info was > added to recent kernels. > > Package ID seems to be properly set on POWER8/9 machines I have access to. > Maybe something related to LPAR exposing a special/virtual topology hence > die and package ID wouldn't make sense? > > I will try to find out. However, I think that hwloc-calc should store the > values as an unsigned integers and represent them the same way as kernel. > BTW, when using hwloc API, I'm getting the correct values: > > obj[0] = hwloc_get_pu_obj_by_os_index(topology, pu_hier]); > obj[2] = hwloc_get_ancestor_obj_by_type (topology, HWLOC_OBJ_SOCKET, > obj[0]); > obj[2]->os_index => -1 > > What are your thoughts? > > > os_index is already *unsigned* in the API (did you mean signed?). We > cannot change the obj->os_index back to signed now, it would break existing > users. > > But being signed wouldn't help much. -1 is the special value > HWLOC_UNKNOWN_INDEX, it doesn't matter if it's stored as -1 or 4294967295. > Users shouldn't rely on these numbers anyway. However maybe debugging would > be easier if tools printed that special value as -1 instead of 4294967295 > (I'd need to check other tools too, lstopo takes care of some of these > values, maybe not all). > Brice > _______________________________________________ > hwloc-devel mailing list > hwloc-devel@lists.open-mpi.org > https://lists.open-mpi.org/mailman/listinfo/hwloc-devel -- -Jirka
_______________________________________________ hwloc-devel mailing list hwloc-devel@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/hwloc-devel