>
> The ACPI SLIT table (reported by numactl -H) was indeed often dumb or even
> wrong. But SLIT wasn't widely used anyway, so vendors didn't care much
> about putting valid info there, it didn't break anything in most
> applications. Hopefully it won't be the case for HMAT because HMAT will be
> the official way to figure out which target memory is fast or not. If
> vendors don't fill it properly, the OS may use HBM or NVDIMMs by default
> instead of DDR, which will likely cause more problems than a broken SLIT.


Right. Even now, SLIT values have an impact on the Linux scheduler.  See
this: https://www.codeblueprint.co.uk/2019/07/12/what-are-slit-tables.html

"The current magic value used inside Linux kernel is 30 – if the NUMA node
distance between two nodes is more than 30, the Linux kernel scheduler will
try not to migrate tasks between them."
https://github.com/torvalds/linux/blob/master/include/linux/topology.h#L60

There's an example at the end of the manpage of hwloc-annotate. It's very
> similar to your line, but you likely need a capital to "Bandwidth".

Yes, it works as expected when used with the capital "B"  See [1].

I'll see if I can make things case-insensitive in the tools (not in the C
> API).

Yes, it would be a nice improvement.  Currently, there is a mismatch
between different commands.  hwloc-info supports both bandwidth and
Bandwidth, but hwloc-annotate requires a capital letter.

hwloc-info --best-memattr bandwidth
hwloc-info --best-memattr Bandwidth
hwloc-annotate in.xml out.xml node:0 memattr Bandwidth node:0 18 && mv
out.xml in.xml

Merci beaucoup!
Jirka


[1]
lstopo in.xml
hwloc-annotate in.xml out.xml node:0 memattr Bandwidth node:0 18 && mv
out.xml in.xml
hwloc-annotate in.xml out.xml node:0 memattr Bandwidth node:1 9 && mv -f
out.xml in.xml
hwloc-annotate in.xml out.xml node:0 memattr Bandwidth node:2 9 && mv -f
out.xml in.xml
hwloc-annotate in.xml out.xml node:0 memattr Bandwidth node:3 9 && mv -f
out.xml in.xml
hwloc-annotate in.xml out.xml node:1 memattr Bandwidth node:0 9 && mv -f
out.xml in.xml
hwloc-annotate in.xml out.xml node:1 memattr Bandwidth node:1 18 && mv -f
out.xml in.xml
hwloc-annotate in.xml out.xml node:1 memattr Bandwidth node:2 9 && mv -f
out.xml in.xml
hwloc-annotate in.xml out.xml node:1 memattr Bandwidth node:3 9 && mv -f
out.xml in.xml
hwloc-annotate in.xml out.xml node:2 memattr Bandwidth node:0 9 && mv -f
out.xml in.xml
hwloc-annotate in.xml out.xml node:2 memattr Bandwidth node:1 9 && mv -f
out.xml in.xml
hwloc-annotate in.xml out.xml node:2 memattr Bandwidth node:2 18 && mv -f
out.xml in.xml
hwloc-annotate in.xml out.xml node:2 memattr Bandwidth node:3 9 && mv -f
out.xml in.xml
hwloc-annotate in.xml out.xml node:3 memattr Bandwidth node:0 9 && mv -f
out.xml in.xml
hwloc-annotate in.xml out.xml node:3 memattr Bandwidth node:1 9 && mv -f
out.xml in.xml
hwloc-annotate in.xml out.xml node:3 memattr Bandwidth node:2 9 && mv -f
out.xml in.xml
hwloc-annotate in.xml out.xml node:3 memattr Bandwidth node:3 18 && mv -f
out.xml in.xml
$ lstopo-no-graphics --input in.xml --memattrs
Memory attribute #0 name `Capacity' flags 1
 NUMANode L#0 = 16469168128
 NUMANode L#1 = 16908922880
 NUMANode L#2 = 16881680384
 NUMANode L#3 = 16908451840
Memory attribute #1 name `Locality' flags 2
 NUMANode L#0 = 8
 NUMANode L#1 = 8
 NUMANode L#2 = 8
 NUMANode L#3 = 8
Memory attribute #2 name `Bandwidth' flags 5
 NUMANode L#0 = 18 (NUMANode L#0)
 NUMANode L#0 = 9 (NUMANode L#1)
 NUMANode L#0 = 9 (NUMANode L#2)
 NUMANode L#0 = 9 (NUMANode L#3)
 NUMANode L#1 = 9 (NUMANode L#0)
 NUMANode L#1 = 18 (NUMANode L#1)
 NUMANode L#1 = 9 (NUMANode L#2)
 NUMANode L#1 = 9 (NUMANode L#3)
 NUMANode L#2 = 9 (NUMANode L#0)
 NUMANode L#2 = 9 (NUMANode L#1)
 NUMANode L#2 = 18 (NUMANode L#2)
 NUMANode L#2 = 9 (NUMANode L#3)
 NUMANode L#3 = 9 (NUMANode L#0)
 NUMANode L#3 = 9 (NUMANode L#1)
 NUMANode L#3 = 9 (NUMANode L#2)
 NUMANode L#3 = 18 (NUMANode L#3)
Memory attribute #3 name `Latency' flags 6


On Fri, Oct 2, 2020 at 12:43 AM Brice Goglin <brice.gog...@inria.fr> wrote:

> Le 01/10/2020 à 22:17, Jirka Hladky a écrit :
>
>
> This is interesting! ACPI tables are often wrong - having the option to
> annotate more accurate data to the hwloc is great.
>
>
> The ACPI SLIT table (reported by numactl -H) was indeed often dumb or even
> wrong. But SLIT wasn't widely used anyway, so vendors didn't care much
> about putting valid info there, it didn't break anything in most
> applications. Hopefully it won't be the case for HMAT because HMAT will be
> the official way to figure out which target memory is fast or not. If
> vendors don't fill it properly, the OS may use HBM or NVDIMMs by default
> instead of DDR, which will likely cause more problems than a broken SLIT.
>
>
> We have a simple C program to measure the bandwidth between NUMA nodes,
> producing a table similar to the output of numactl -H (but with values in
> GB/s).
>
> node   0   1   2   3
>  0:  10  16  16  16
>  1:  16  10  16  16
>  2:  16  16  10  16
>  3:  16  16  16  10
>
> I was trying to annotate it using hwloc-annotate, but I have not
> succeeded. :
>
> lstopo in.xml
> hwloc-annotate in.xml out.xml node:0 memattr bandwidth node:0 18
> Failed to find memattr by name bandwidth
>
> Is there some example of how to do this?
>
>
> There's an example at the end of the manpage of hwloc-annotate. It's very
> similar to your line, but you likely need a capital to "Bandwidth". I'll
> see if I can make things case-insensitive in the tools (not in the C API).
>
>
>
> Also, are there any plans for having a tool, which would measure the
> memory bandwidth and annotate the results to XML for later usage with hwloc
> commands?
>
>
> We've been talking about this for years. Having a good performance
> measurement tool isn't easy. I see people sending patches for adding some
> assembly because this corner case on this processor isn't well optimized by
> GCC :/ I am not sure we want to put this inside hwloc.
>
> Brice
>
>
>
> On Thu, Oct 1, 2020 at 7:28 PM Brice Goglin <brice.gog...@inria.fr> wrote:
>
>>
>> Le 01/10/2020 à 19:16, Jirka Hladky a écrit :
>>
>> Hi Brice,
>>
>> this new feature sounds very interesting!
>>
>> Add hwloc/memattrs.h for exposing latency/bandwidth information
>>>     between initiators (CPU sets for now) and target NUMA nodes,
>>>     typically on heterogeneous platforms.
>>
>>
>> If I get it right, I need to have an ACPI HMAT table on the system to use
>> the new functionality, right?
>>
>>
>> Hello Jirka
>>
>> It's also possible to add memory attribute using the C API or with
>> hwloc-annotate to modify a XML (you may create attribute, or add values for
>> a given attribute).
>>
>>
>> I have tried following on Fedora
>> acpidump -o acpidump.bin
>> acpixtract -a acpidump.bin
>>
>> but there is no HMAT table reported. So it seems I'm out of luck, and I
>> cannot test the new functionality, right?
>>
>>
>> Besides KNL (which is too old to have HMAT, but hwloc now provides
>> hardwired bandwidth/latency values), the only platforms with heterogeneous
>> memories right now are Intel machines with Optane DCPMM (NVDIMMs). Some
>> have a HMAT, some don't. If your machine doesn't, it's possible to provide
>> a custom HMAT table in the initrd. That's not easy, so adding attribute
>> values with hwloc-annotate might be easier.
>>
>>
>>
>> Also, where can we find the list of attributes supported
>> by --best-memattr?
>>   --best-memattr <attr> Only display the best target among the local nodes
>>
>>
>> There are 4 standard attributes defined in hwloc/memattrs.h: capacity,
>> locality, latency and bandwidth.They are also visible in lstopo -vv or
>> lstopo --memattrs. I'll something in the doc.
>>
>>
>>
>> By trial and error, I have found out that latency and bandwidth are
>> supported. Are there any other? Could you please add the list to hwloc-info
>> -h?
>>
>>
>> I could add the default ones, but I'll need to specify that additional
>> user-given attributes may exist.
>>
>> Thanks for the feedback.
>>
>> Brice
>>
>>
>>
>>
>> hwloc-info --best-memattr bandwidth
>> hwloc-info --best-memattr latency
>>
>> Thanks a lot!
>> Jirka
>>
>>
>> On Thu, Oct 1, 2020 at 12:45 AM Brice Goglin <brice.gog...@inria.fr>
>> wrote:
>>
>>> hwloc (Hardware Locality) 2.3.0 is now available for download.
>>>
>>>     https://www.open-mpi.org/software/hwloc/v2.3/ 
>>> <https://www.open-mpi.org/software/hwloc/v2.0/>
>>>
>>> v2.3.0 brings quite a lot of changes. The biggest one is the addition
>>> of the memory attribute API to expose hardware information that vendors
>>> are (slowly) adding to ACPI tables to describe heterogeneous memory
>>> platforms (mostly DDR+NVDIMMs right now).
>>>
>>> The following is a summary of the changes since v2.2.0.
>>>
>>> Version 2.3.0
>>> -------------
>>> * API
>>>   + Add hwloc/memattrs.h for exposing latency/bandwidth information
>>>     between initiators (CPU sets for now) and target NUMA nodes,
>>>     typically on heterogeneous platforms.
>>>     - When available, bandwidths and latencies are read from the ACPI HMAT
>>>       table exposed by Linux kernel 5.2+.
>>>     - Attributes may also be customized to expose user-defined performance
>>>       information.
>>>   + Add hwloc_get_local_numanode_objs() for listing NUMA nodes that are
>>>     local to some locality.
>>>   + The new topology flag HWLOC_TOPOLOGY_FLAG_IMPORT_SUPPORT causes
>>>     support arrays to be loaded from XML exported with hwloc 2.3+.
>>>     - hwloc_topology_get_support() now returns an additional "misc"
>>>       array with feature "imported_support" set when support was imported.
>>>   + Add hwloc_topology_refresh() to refresh internal caches after modifying
>>>     the topology and before consulting the topology in a multithread 
>>> context.
>>> * Backends
>>>   + Add a ROCm SMI backend and a hwloc/rsmi.h helper file for getting
>>>     the locality of AMD GPUs, now exposed as "rsmi" OS devices.
>>>     Thanks to Mike Li.
>>>   + Remove POWER device-tree-based topology on Linux,
>>>     (it was disabled by default since 2.1).
>>> * Tools
>>>   + Command-line options for specifying flags now understand comma-separated
>>>     lists of flag names (substrings).
>>>   + hwloc-info and hwloc-calc have new --local-memory --local-memory-flags
>>>     and --best-memattr options for reporting local memory nodes and 
>>> filtering
>>>     by memory attributes.
>>>   + hwloc-bind has a new --best-memattr option for filtering by memory 
>>> attributes
>>>     among the memory binding set.
>>>   + Tools that have a --restrict option may now receive a nodeset or
>>>     some custom flags for restricting the topology.
>>>   + lstopo now has a --thickness option for changing line thickness in the
>>>     graphical output.
>>>   + Fix lstopo drawing when autoresizing on Windows 10.
>>>   + Pressing the F5 key in lstopo X11 and Windows graphical/interactive 
>>> outputs
>>>     now refreshes the display according to the current topology and binding.
>>>   + Add a tikz lstopo graphical backend to generate picture easily included 
>>> into
>>>     LaTeX documents. Thanks to Clement Foyer.
>>> * Misc
>>>   + The default installation path of the Bash completion file has changed to
>>>     ${datadir}/bash-completion/completions/hwloc. Thanks to Tomasz Kłoczko.
>>>
>>>
>>> Changes since 2.3.0rc1 are negligible.
>>> --
>>> Brice
>>>
>>>
>>> _______________________________________________
>>> hwloc-announce mailing list
>>> hwloc-annou...@lists.open-mpi.org
>>> https://lists.open-mpi.org/mailman/listinfo/hwloc-announce
>>
>>
>>
>> --
>> -Jirka
>>
>> _______________________________________________
>> hwloc-users mailing 
>> listhwloc-us...@lists.open-mpi.orghttps://lists.open-mpi.org/mailman/listinfo/hwloc-users
>>
>> _______________________________________________
>> hwloc-users mailing list
>> hwloc-users@lists.open-mpi.org
>> https://lists.open-mpi.org/mailman/listinfo/hwloc-users
>
>
>
> --
> -Jirka
>
> _______________________________________________
> hwloc-users mailing 
> listhwloc-us...@lists.open-mpi.orghttps://lists.open-mpi.org/mailman/listinfo/hwloc-users
>
> _______________________________________________
> hwloc-users mailing list
> hwloc-users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/hwloc-users



-- 
-Jirka
_______________________________________________
hwloc-users mailing list
hwloc-users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/hwloc-users

Reply via email to