Le 26/05/2021 à 13:51, Jirka Hladky a écrit :
Hi Brice,

I would like to get your opinion on the following issue. On IBM LPAR, kernel reports  die_id and physical_package_id to be -1.
See [0]

hwloc-calc converts these values into an unsigned integer, resulting in Socket ID 2^32-1:

hwloc-calc --physical-output --intersect socket core:0
4294967295

I'm not quite sure why are die_id and physical_package_id set to -1. Perhaps it signalizes some error condition.


Hello Jirka

die_id might be only implemented on x86 since it was the only architecture that could expose different dies within packages when die topology info was added to recent kernels.

Package ID seems to be properly set on POWER8/9 machines I have access to. Maybe something related to LPAR exposing a special/virtual topology hence die and package ID wouldn't make sense?


I will try to find out. However, I think that hwloc-calc should store the values as an unsigned integers and represent them the same way as kernel. BTW, when using hwloc API, I'm getting the correct values:

obj[0] = hwloc_get_pu_obj_by_os_index(topology, pu_hier]);
obj[2] = hwloc_get_ancestor_obj_by_type (topology, HWLOC_OBJ_SOCKET, obj[0]);
obj[2]->os_index => -1

What are your thoughts?


os_index is already *unsigned* in the API (did you mean signed?). We cannot change the obj->os_index back to signed now, it would break existing users.

But being signed wouldn't help much. -1 is the special value HWLOC_UNKNOWN_INDEX, it doesn't matter if it's stored as -1 or 4294967295. Users shouldn't rely on these numbers anyway. However maybe debugging would be easier if tools printed that special value as -1 instead of 4294967295 (I'd need to check other tools too, lstopo takes care of some of these values, maybe not all).

Brice

Attachment: OpenPGP_signature
Description: OpenPGP digital signature

_______________________________________________
hwloc-devel mailing list
hwloc-devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/hwloc-devel

Reply via email to