Hi Brice,

thanks for looking into this!

Package ID seems to be properly set on POWER8/9 machines I have access to.
> Maybe something related to LPAR exposing a special/virtual topology hence
> die and package ID wouldn't make sense?

Yes, it sounds plausible. I will try to find out.

 os_index is already *unsigned* in the API (did you mean signed?)

My bad. My C code using hwloc API was printing -1. It's because I store
"obj[2]->os_index" as signed integer, so it gets converted back to -1.

We cannot change the obj->os_index back to signed now, it would break
> existing users.

Sure, I understand that.

 However maybe debugging would be easier if tools printed that special
> value as -1 instead of 4294967295 (I'd need to check other tools too,
> lstopo takes care of some of these values, maybe not all).

I agree.  So perhaps we can update to tools only, to print 4294967295 as -1?

Thanks!
Jirka

On Wed, May 26, 2021 at 2:14 PM Brice Goglin <brice.gog...@inria.fr> wrote:

> Le 26/05/2021 à 13:51, Jirka Hladky a écrit :
>
> Hi Brice,
>
> I would like to get your opinion on the following issue. On IBM LPAR,
> kernel reports  die_id and physical_package_id to be -1. See [0]
>
> hwloc-calc converts these values into an unsigned integer, resulting in
> Socket ID 2^32-1:
>
> hwloc-calc --physical-output --intersect socket core:0
> 4294967295
>
> I'm not quite sure why are die_id and physical_package_id set to -1.
> Perhaps it signalizes some error condition.
>
>
> Hello Jirka
>
> die_id might be only implemented on x86 since it was the only architecture
> that could expose different dies within packages when die topology info was
> added to recent kernels.
>
> Package ID seems to be properly set on POWER8/9 machines I have access to.
> Maybe something related to LPAR exposing a special/virtual topology hence
> die and package ID wouldn't make sense?
>
> I will try to find out. However, I think that hwloc-calc should store the
> values as an unsigned integers and represent them the same way as kernel.
> BTW, when using hwloc API, I'm getting the correct values:
>
> obj[0] = hwloc_get_pu_obj_by_os_index(topology, pu_hier]);
> obj[2] = hwloc_get_ancestor_obj_by_type (topology, HWLOC_OBJ_SOCKET,
> obj[0]);
> obj[2]->os_index => -1
>
> What are your thoughts?
>
>
> os_index is already *unsigned* in the API (did you mean signed?). We
> cannot change the obj->os_index back to signed now, it would break existing
> users.
>
> But being signed wouldn't help much. -1 is the special value
> HWLOC_UNKNOWN_INDEX, it doesn't matter if it's stored as -1 or 4294967295.
> Users shouldn't rely on these numbers anyway. However maybe debugging would
> be easier if tools printed that special value as -1 instead of 4294967295
> (I'd need to check other tools too, lstopo takes care of some of these
> values, maybe not all).
> Brice
> _______________________________________________
> hwloc-devel mailing list
> hwloc-devel@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/hwloc-devel



-- 
-Jirka
_______________________________________________
hwloc-devel mailing list
hwloc-devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/hwloc-devel

Reply via email to