Re: [hwloc-devel] Negative values for die_id and physical_package_id
Le 26/05/2021 à 15:23, Samuel Thibault a écrit : Brice Goglin, le mer. 26 mai 2021 14:13:02 +0200, a ecrit: os_index is already *unsigned* in the API (did you mean signed?). We cannot change the obj->os_index back to signed now, it would break existing users. Mmm, it wouldn't break the ABI, only printf formats using %u? Right. I tried the idea for hwloc 2.0, but I quickly stopped because of the loads of printf signedness warnings that our picky users were getting :/ Brice OpenPGP_signature Description: OpenPGP digital signature ___ hwloc-devel mailing list hwloc-devel@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/hwloc-devel
Re: [hwloc-devel] Negative values for die_id and physical_package_id
Brice Goglin, le mer. 26 mai 2021 14:13:02 +0200, a ecrit: > os_index is already *unsigned* in the API (did you mean signed?). We cannot > change the obj->os_index back to signed now, it would break existing users. Mmm, it wouldn't break the ABI, only printf formats using %u? Samuel ___ hwloc-devel mailing list hwloc-devel@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/hwloc-devel
Re: [hwloc-devel] Negative values for die_id and physical_package_id
Merci beaucoup! BTW, it seems that lstopo does not print indexes when the value is HWLOC_UNKNOWN_INDEX $ lstopo --physical --allow all Machine (31GB total) Package NUMANode P#0 (26GB) L3 (4096KB) + Core P#0 L2 (1024KB) + L1d (32KB) + L1i (48KB) Die + PU P#0 PU P#2 PU P#4 PU P#6 L2 (1024KB) + L1d (32KB) + L1i (48KB) PU P#1 PU P#3 PU P#5 PU P#7 Group0 NUMANode P#3 (5354MB) Block(Disk) "sda" Net "env2" On Wed, May 26, 2021 at 2:33 PM Brice Goglin wrote: > Le 26/05/2021 à 14:24, Jirka Hladky a écrit : > > > However maybe debugging would be easier if tools printed that special >> value as -1 instead of 4294967295 (I'd need to check other tools too, >> lstopo takes care of some of these values, maybe not all). > > I agree. So perhaps we can update to tools only, to print 4294967295 as > -1? > > > Opened as https://github.com/open-mpi/hwloc/issues/468 > > Hopefully we'll get some time to fix this before releasing 2.5. > > Brice > > > ___ > hwloc-devel mailing list > hwloc-devel@lists.open-mpi.org > https://lists.open-mpi.org/mailman/listinfo/hwloc-devel -- -Jirka ___ hwloc-devel mailing list hwloc-devel@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/hwloc-devel
Re: [hwloc-devel] Negative values for die_id and physical_package_id
Le 26/05/2021 à 14:24, Jirka Hladky a écrit : However maybe debugging would be easier if tools printed that special value as -1 instead of 4294967295 (I'd need to check other tools too, lstopo takes care of some of these values, maybe not all). I agree. So perhaps we can update to tools only, to print 4294967295 as -1? Opened as https://github.com/open-mpi/hwloc/issues/468 Hopefully we'll get some time to fix this before releasing 2.5. Brice OpenPGP_signature Description: OpenPGP digital signature ___ hwloc-devel mailing list hwloc-devel@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/hwloc-devel
Re: [hwloc-devel] Negative values for die_id and physical_package_id
Hi Brice, thanks for looking into this! Package ID seems to be properly set on POWER8/9 machines I have access to. > Maybe something related to LPAR exposing a special/virtual topology hence > die and package ID wouldn't make sense? Yes, it sounds plausible. I will try to find out. os_index is already *unsigned* in the API (did you mean signed?) My bad. My C code using hwloc API was printing -1. It's because I store "obj[2]->os_index" as signed integer, so it gets converted back to -1. We cannot change the obj->os_index back to signed now, it would break > existing users. Sure, I understand that. However maybe debugging would be easier if tools printed that special > value as -1 instead of 4294967295 (I'd need to check other tools too, > lstopo takes care of some of these values, maybe not all). I agree. So perhaps we can update to tools only, to print 4294967295 as -1? Thanks! Jirka On Wed, May 26, 2021 at 2:14 PM Brice Goglin wrote: > Le 26/05/2021 à 13:51, Jirka Hladky a écrit : > > Hi Brice, > > I would like to get your opinion on the following issue. On IBM LPAR, > kernel reports die_id and physical_package_id to be -1. See [0] > > hwloc-calc converts these values into an unsigned integer, resulting in > Socket ID 2^32-1: > > hwloc-calc --physical-output --intersect socket core:0 > 4294967295 > > I'm not quite sure why are die_id and physical_package_id set to -1. > Perhaps it signalizes some error condition. > > > Hello Jirka > > die_id might be only implemented on x86 since it was the only architecture > that could expose different dies within packages when die topology info was > added to recent kernels. > > Package ID seems to be properly set on POWER8/9 machines I have access to. > Maybe something related to LPAR exposing a special/virtual topology hence > die and package ID wouldn't make sense? > > I will try to find out. However, I think that hwloc-calc should store the > values as an unsigned integers and represent them the same way as kernel. > BTW, when using hwloc API, I'm getting the correct values: > > obj[0] = hwloc_get_pu_obj_by_os_index(topology, pu_hier]); > obj[2] = hwloc_get_ancestor_obj_by_type (topology, HWLOC_OBJ_SOCKET, > obj[0]); > obj[2]->os_index => -1 > > What are your thoughts? > > > os_index is already *unsigned* in the API (did you mean signed?). We > cannot change the obj->os_index back to signed now, it would break existing > users. > > But being signed wouldn't help much. -1 is the special value > HWLOC_UNKNOWN_INDEX, it doesn't matter if it's stored as -1 or 4294967295. > Users shouldn't rely on these numbers anyway. However maybe debugging would > be easier if tools printed that special value as -1 instead of 4294967295 > (I'd need to check other tools too, lstopo takes care of some of these > values, maybe not all). > Brice > ___ > hwloc-devel mailing list > hwloc-devel@lists.open-mpi.org > https://lists.open-mpi.org/mailman/listinfo/hwloc-devel -- -Jirka ___ hwloc-devel mailing list hwloc-devel@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/hwloc-devel
Re: [hwloc-devel] Negative values for die_id and physical_package_id
Le 26/05/2021 à 13:51, Jirka Hladky a écrit : Hi Brice, I would like to get your opinion on the following issue. On IBM LPAR, kernel reports die_id and physical_package_id to be -1. See [0] hwloc-calc converts these values into an unsigned integer, resulting in Socket ID 2^32-1: hwloc-calc --physical-output --intersect socket core:0 4294967295 I'm not quite sure why are die_id and physical_package_id set to -1. Perhaps it signalizes some error condition. Hello Jirka die_id might be only implemented on x86 since it was the only architecture that could expose different dies within packages when die topology info was added to recent kernels. Package ID seems to be properly set on POWER8/9 machines I have access to. Maybe something related to LPAR exposing a special/virtual topology hence die and package ID wouldn't make sense? I will try to find out. However, I think that hwloc-calc should store the values as an unsigned integers and represent them the same way as kernel. BTW, when using hwloc API, I'm getting the correct values: obj[0] = hwloc_get_pu_obj_by_os_index(topology, pu_hier]); obj[2] = hwloc_get_ancestor_obj_by_type (topology, HWLOC_OBJ_SOCKET, obj[0]); obj[2]->os_index => -1 What are your thoughts? os_index is already *unsigned* in the API (did you mean signed?). We cannot change the obj->os_index back to signed now, it would break existing users. But being signed wouldn't help much. -1 is the special value HWLOC_UNKNOWN_INDEX, it doesn't matter if it's stored as -1 or 4294967295. Users shouldn't rely on these numbers anyway. However maybe debugging would be easier if tools printed that special value as -1 instead of 4294967295 (I'd need to check other tools too, lstopo takes care of some of these values, maybe not all). Brice OpenPGP_signature Description: OpenPGP digital signature ___ hwloc-devel mailing list hwloc-devel@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/hwloc-devel
[hwloc-devel] Negative values for die_id and physical_package_id
Hi Brice, I would like to get your opinion on the following issue. On IBM LPAR, kernel reports die_id and physical_package_id to be -1. See [0] hwloc-calc converts these values into an unsigned integer, resulting in Socket ID 2^32-1: hwloc-calc --physical-output --intersect socket core:0 4294967295 I'm not quite sure why are die_id and physical_package_id set to -1. Perhaps it signalizes some error condition. I will try to find out. However, I think that hwloc-calc should store the values as an unsigned integers and represent them the same way as kernel. BTW, when using hwloc API, I'm getting the correct values: obj[0] = hwloc_get_pu_obj_by_os_index(topology, pu_hier]); obj[2] = hwloc_get_ancestor_obj_by_type (topology, HWLOC_OBJ_SOCKET, obj[0]); obj[2]->os_index => -1 What are your thoughts? Thanks a lot! Jirka [0] /sys/devices/system/cpu/cpu1/topology $ cat die_id -1 $ cat physical_package_id -1 ___ hwloc-devel mailing list hwloc-devel@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/hwloc-devel