Re: [hwloc-devel] Negative values for die_id and physical_package_id

2021-05-26 Thread Brice Goglin

Le 26/05/2021 à 15:23, Samuel Thibault a écrit :

Brice Goglin, le mer. 26 mai 2021 14:13:02 +0200, a ecrit:

os_index is already *unsigned* in the API (did you mean signed?). We cannot
change the obj->os_index back to signed now, it would break existing users.

Mmm, it wouldn't break the ABI, only printf formats using %u?



Right. I tried the idea for hwloc 2.0, but I quickly stopped because of 
the loads of printf signedness warnings that our picky users were getting 
:/


Brice




OpenPGP_signature
Description: OpenPGP digital signature
___
hwloc-devel mailing list
hwloc-devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/hwloc-devel

Re: [hwloc-devel] Negative values for die_id and physical_package_id

2021-05-26 Thread Samuel Thibault
Brice Goglin, le mer. 26 mai 2021 14:13:02 +0200, a ecrit:
> os_index is already *unsigned* in the API (did you mean signed?). We cannot
> change the obj->os_index back to signed now, it would break existing users.

Mmm, it wouldn't break the ABI, only printf formats using %u?

Samuel
___
hwloc-devel mailing list
hwloc-devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/hwloc-devel


Re: [hwloc-devel] Negative values for die_id and physical_package_id

2021-05-26 Thread Jirka Hladky
Merci beaucoup!

BTW, it seems that lstopo does not print indexes when the value is
HWLOC_UNKNOWN_INDEX

$ lstopo --physical --allow all
Machine (31GB total)
  Package
NUMANode P#0 (26GB)
L3 (4096KB) + Core P#0
  L2 (1024KB) + L1d (32KB) + L1i (48KB)
Die + PU P#0
PU P#2
PU P#4
PU P#6
  L2 (1024KB) + L1d (32KB) + L1i (48KB)
PU P#1
PU P#3
PU P#5
PU P#7
  Group0
NUMANode P#3 (5354MB)
  Block(Disk) "sda"
  Net "env2"

On Wed, May 26, 2021 at 2:33 PM Brice Goglin  wrote:

> Le 26/05/2021 à 14:24, Jirka Hladky a écrit :
>
>
>  However maybe debugging would be easier if tools printed that special
>> value as -1 instead of 4294967295 (I'd need to check other tools too,
>> lstopo takes care of some of these values, maybe not all).
>
> I agree.  So perhaps we can update to tools only, to print 4294967295 as
> -1?
>
>
> Opened as https://github.com/open-mpi/hwloc/issues/468
>
> Hopefully we'll get some time to fix this before releasing 2.5.
>
> Brice
>
>
> ___
> hwloc-devel mailing list
> hwloc-devel@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/hwloc-devel



-- 
-Jirka
___
hwloc-devel mailing list
hwloc-devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/hwloc-devel

Re: [hwloc-devel] Negative values for die_id and physical_package_id

2021-05-26 Thread Brice Goglin

Le 26/05/2021 à 14:24, Jirka Hladky a écrit :


 However maybe debugging would be easier if tools printed that
special value as -1 instead of 4294967295 (I'd need to check other
tools too, lstopo takes care of some of these values, maybe not all).

I agree.  So perhaps we can update to tools only, to print 4294967295 
as -1?



Opened as https://github.com/open-mpi/hwloc/issues/468

Hopefully we'll get some time to fix this before releasing 2.5.

Brice




OpenPGP_signature
Description: OpenPGP digital signature
___
hwloc-devel mailing list
hwloc-devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/hwloc-devel

Re: [hwloc-devel] Negative values for die_id and physical_package_id

2021-05-26 Thread Jirka Hladky
Hi Brice,

thanks for looking into this!

Package ID seems to be properly set on POWER8/9 machines I have access to.
> Maybe something related to LPAR exposing a special/virtual topology hence
> die and package ID wouldn't make sense?

Yes, it sounds plausible. I will try to find out.

 os_index is already *unsigned* in the API (did you mean signed?)

My bad. My C code using hwloc API was printing -1. It's because I store
"obj[2]->os_index" as signed integer, so it gets converted back to -1.

We cannot change the obj->os_index back to signed now, it would break
> existing users.

Sure, I understand that.

 However maybe debugging would be easier if tools printed that special
> value as -1 instead of 4294967295 (I'd need to check other tools too,
> lstopo takes care of some of these values, maybe not all).

I agree.  So perhaps we can update to tools only, to print 4294967295 as -1?

Thanks!
Jirka

On Wed, May 26, 2021 at 2:14 PM Brice Goglin  wrote:

> Le 26/05/2021 à 13:51, Jirka Hladky a écrit :
>
> Hi Brice,
>
> I would like to get your opinion on the following issue. On IBM LPAR,
> kernel reports  die_id and physical_package_id to be -1. See [0]
>
> hwloc-calc converts these values into an unsigned integer, resulting in
> Socket ID 2^32-1:
>
> hwloc-calc --physical-output --intersect socket core:0
> 4294967295
>
> I'm not quite sure why are die_id and physical_package_id set to -1.
> Perhaps it signalizes some error condition.
>
>
> Hello Jirka
>
> die_id might be only implemented on x86 since it was the only architecture
> that could expose different dies within packages when die topology info was
> added to recent kernels.
>
> Package ID seems to be properly set on POWER8/9 machines I have access to.
> Maybe something related to LPAR exposing a special/virtual topology hence
> die and package ID wouldn't make sense?
>
> I will try to find out. However, I think that hwloc-calc should store the
> values as an unsigned integers and represent them the same way as kernel.
> BTW, when using hwloc API, I'm getting the correct values:
>
> obj[0] = hwloc_get_pu_obj_by_os_index(topology, pu_hier]);
> obj[2] = hwloc_get_ancestor_obj_by_type (topology, HWLOC_OBJ_SOCKET,
> obj[0]);
> obj[2]->os_index => -1
>
> What are your thoughts?
>
>
> os_index is already *unsigned* in the API (did you mean signed?). We
> cannot change the obj->os_index back to signed now, it would break existing
> users.
>
> But being signed wouldn't help much. -1 is the special value
> HWLOC_UNKNOWN_INDEX, it doesn't matter if it's stored as -1 or 4294967295.
> Users shouldn't rely on these numbers anyway. However maybe debugging would
> be easier if tools printed that special value as -1 instead of 4294967295
> (I'd need to check other tools too, lstopo takes care of some of these
> values, maybe not all).
> Brice
> ___
> hwloc-devel mailing list
> hwloc-devel@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/hwloc-devel



-- 
-Jirka
___
hwloc-devel mailing list
hwloc-devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/hwloc-devel

Re: [hwloc-devel] Negative values for die_id and physical_package_id

2021-05-26 Thread Brice Goglin

Le 26/05/2021 à 13:51, Jirka Hladky a écrit :

Hi Brice,

I would like to get your opinion on the following issue. On IBM LPAR, 
kernel reports  die_id and physical_package_id to be -1. 

See [0]


hwloc-calc converts these values into an unsigned integer, resulting 
in Socket ID 2^32-1:


hwloc-calc --physical-output --intersect socket core:0
4294967295

I'm not quite sure why are die_id and physical_package_id set to -1. 
Perhaps it signalizes some error condition.



Hello Jirka

die_id might be only implemented on x86 since it was the only 
architecture that could expose different dies within packages when die 
topology info was added to recent kernels.


Package ID seems to be properly set on POWER8/9 machines I have access 
to. Maybe something related to LPAR exposing a special/virtual topology 
hence die and package ID wouldn't make sense?



I will try to find out. However, I think that hwloc-calc should store 
the values as an unsigned integers and represent them the same way as 
kernel. BTW, when using hwloc API, I'm getting the correct values:


obj[0] = hwloc_get_pu_obj_by_os_index(topology, pu_hier]);
obj[2] = hwloc_get_ancestor_obj_by_type (topology, HWLOC_OBJ_SOCKET, 
obj[0]);

obj[2]->os_index => -1

What are your thoughts?



os_index is already *unsigned* in the API (did you mean signed?). We 
cannot change the obj->os_index back to signed now, it would break 
existing users.


But being signed wouldn't help much. -1 is the special value 
HWLOC_UNKNOWN_INDEX, it doesn't matter if it's stored as -1 or 
4294967295. Users shouldn't rely on these numbers anyway. However maybe 
debugging would be easier if tools printed that special value as -1 
instead of 4294967295 (I'd need to check other tools too, lstopo takes 
care of some of these values, maybe not all).


Brice


OpenPGP_signature
Description: OpenPGP digital signature
___
hwloc-devel mailing list
hwloc-devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/hwloc-devel

[hwloc-devel] Negative values for die_id and physical_package_id

2021-05-26 Thread Jirka Hladky
Hi Brice,

I would like to get your opinion on the following issue. On IBM LPAR,
kernel reports  die_id and physical_package_id to be -1. See [0]

hwloc-calc converts these values into an unsigned integer, resulting in
Socket ID 2^32-1:

hwloc-calc --physical-output --intersect socket core:0
4294967295

I'm not quite sure why are die_id and physical_package_id set to -1.
Perhaps it signalizes some error condition. I will try to find out.
However, I think that hwloc-calc should store the values as an unsigned
integers and represent them the same way as kernel. BTW, when using hwloc
API, I'm getting the correct values:

obj[0] = hwloc_get_pu_obj_by_os_index(topology, pu_hier]);
obj[2] = hwloc_get_ancestor_obj_by_type (topology, HWLOC_OBJ_SOCKET,
obj[0]);
obj[2]->os_index => -1

What are your thoughts?

Thanks a lot!
Jirka


[0]
/sys/devices/system/cpu/cpu1/topology
$ cat die_id
-1

$ cat physical_package_id
-1
___
hwloc-devel mailing list
hwloc-devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/hwloc-devel