> > > >>> Thank you Mattias for the comments and question, please let me try
> > > >>> to explain the same below
> > > >>>
> > > >>>> We shouldn't have a separate CPU/cache hierarchy API instead?
> > > >>>
> > > >>> Based on the intention to bring in CPU lcores which share same L3
> > > >>> (for better cache hits and less noisy neighbor) current API focuses
> > > >>> on using
> > > >>>
> > > >>> Last Level Cache. But if the suggestion is `there are SoC where L2
> > > >>> cache are also shared, and the new API should be provisioned`, I am
> > > >>> also
> > > >>>
> > > >>> comfortable with the thought.
> > > >>>
> > > >>
> > > >> Rather than some AMD special case API hacked into <rte_lcore.h>, I
> > > >> think we are better off with no DPDK API at all for this kind of 
> > > >> functionality.
> > > >
> > > > Hi Mattias, as shared in the earlier email thread, this is not a AMD 
> > > > special
> > > case at all. Let me try to explain this one more time. One of techniques 
> > > used to
> > > increase cores cost effective way to go for tiles of compute complexes.
> > > > This introduces a bunch of cores in sharing same Last Level Cache 
> > > > (namely
> > > L2, L3 or even L4) depending upon cache topology architecture.
> > > >
> > > > The API suggested in RFC is to help end users to selectively use cores 
> > > > under
> > > same Last Level Cache Hierarchy as advertised by OS (irrespective of the 
> > > BIOS
> > > settings used). This is useful in both bare-metal and container 
> > > environment.
> > > >
> > >
> > > I'm pretty familiar with AMD CPUs and the use of tiles (including the
> > > challenges these kinds of non-uniformities pose for work scheduling).
> > >
> > > To maximize performance, caring about core<->LLC relationship may well not
> > > be enough, and more HT/core/cache/memory topology information is
> > > required. That's what I meant by special case. A proper API should allow
> > > access to information about which lcores are SMT siblings, cores on the 
> > > same
> > > L2, and cores on the same L3, to name a few things. Probably you want to 
> > > fit
> > > NUMA into the same API as well, although that is available already in
> > > <rte_lcore.h>.
> >
> > Thank you Mattias for the information, as shared by in the reply with 
> > Anatoly we want expose a new API `rte_get_next_lcore_ex`
> which intakes a extra argument `u32 flags`.
> > The flags can be RTE_GET_LCORE_L1 (SMT), RTE_GET_LCORE_L2, 
> > RTE_GET_LCORE_L3, RTE_GET_LCORE_BOOST_ENABLED,
> RTE_GET_LCORE_BOOST_DISABLED.
> >
> 
> For the naming, would "rte_get_next_sibling_core" (or lcore if you prefer)
> be a clearer name than just adding "ex" on to the end of the existing
> function?
> 
> Looking logically, I'm not sure about the BOOST_ENABLED and BOOST_DISABLED
> flags you propose - in a system with multiple possible standard and boost
> frequencies what would those correspond to? What's also missing is a define
> for getting actual NUMA siblings i.e. those sharing common memory but not
> an L3 or anything else.
> 
> My suggestion would be to have the function take just an integer-type e.g.
> uint16_t parameter which defines the memory/cache hierarchy level to use, 0
> being lowest, 1 next, and so on. Different systems may have different
> numbers of cache levels so lets just make it a zero-based index of levels,
> rather than giving explicit defines (except for memory which should
> probably always be last). The zero-level will be for "closest neighbour"
> whatever that happens to be, with as many levels as is necessary to express
> the topology, e.g. without SMT, but with 3 cache levels, level 0 would be
> an L2 neighbour, level 1 an L3 neighbour. If the L3 was split within a
> memory NUMA node, then level 2 would give the NUMA siblings. We'd just need
> an API to return the max number of levels along with the iterator.

Sounds like a neat idea to me.

Reply via email to