On Tue, Oct 12, 2021 at 02:27:54PM +0200, Igor Mammedov wrote:
> On Tue, 12 Oct 2021 12:37:54 +0200
> Andrew Jones <drjo...@redhat.com> wrote:
> 
> > On Tue, Oct 12, 2021 at 11:40:16AM +0200, Igor Mammedov wrote:
> > > On Wed,  6 Oct 2021 18:22:08 +0800
> > > Gavin Shan <gs...@redhat.com> wrote:
> > >   
> > > > The following option is used to specify the distance map. It's
> > > > possible the option isn't provided by user. In this case, the
> > > > distance map isn't populated and exposed to platform. On the
> > > > other hand, the empty NUMA node, where no memory resides, is
> > > > allowed on ARM64 virt platform. For these empty NUMA nodes,
> > > > their corresponding device-tree nodes aren't populated, but
> > > > their NUMA IDs should be included in the "/distance-map"
> > > > device-tree node, so that kernel can probe them properly if
> > > > device-tree is used.
> > > > 
> > > >   -numa,dist,src=<numa_id>,dst=<numa_id>,val=<distance>
> > > > 
> > > > So when user doesn't specify distance map, we need to generate
> > > > the default distance map, where the local and remote distances
> > > > are 10 and 20 separately. This adds an extra parameter to the
> > > > exiting complete_init_numa_distance() to generate the default
> > > > distance map for this case.
> > > > 
> > > > Signed-off-by: Gavin Shan <gs...@redhat.com>  
> > > 
> > > 
> > > how about error-ing out if distance map is required but
> > > not provided by user explicitly and asking user to fix
> > > command line?
> > > 
> > > Reasoning behind this that defaults are hard to maintain
> > > and will require compat hacks and being raod blocks down
> > > the road.
> > > Approach I was taking with generic NUMA code, is deprecating
> > > defaults and replacing them with sanity checks, which bail
> > > out on incorrect configuration and ask user to correct command line.
> > > Hence I dislike approach taken in this patch.
> > > 
> > > If you really wish to provide default, push it out of
> > > generic code into ARM specific one
> > > (then I won't oppose it that much (I think PPC does
> > > some magic like this))
> > > Also behavior seems to be ARM specific so generic
> > > NUMA code isn't a place for it anyways  
> > 
> > The distance-map DT node and the default 10/20 distance-map values
> > aren't arch-specific. RISCV is using it too.
> > 
> > I'm on the fence with this. I see erroring-out to require users
> > to provide explicit command lines as a good thing, but I also
> > see it as potentially an unnecessary burden for those that want
> > the default map anyway. The optional nature of the distance-map
> > node and the specification of the default map is here [1]
> > 
> > [1] Linux source: Documentation/devicetree/bindings/numa.txt
> 
> Looking at proposed linux patches [ https://lkml.org/lkml/2021/9/27/31 ],
> using optional distance table as source for numa-node-ids,
> looks like a hack around kernel's inability to fish them out
> from CPU &| PCI nodes (using those nodes as source should
> cover memory-less node use-case).
> 
> I consider including optional node as a policy decision.
> So user shall include it explicitly on QEMU command line
> if necessary (that works just fine for x86), or guest OS
> can make up defaults on its own in absence of data.

OK, so erroring-out on configs that must provide distance-maps, rather
than automatically generating them for all configs is better.

> 
> > So, my r-b stands for this patch, but I also wouldn't complain
> > about respinning it to error out instead.
> 
> > I would complain about
> > moving the logic to Arm specific code, though, since RISCV would
> > then need to duplicate it.
> 
> Instead of putting workaround in QEMU and then making them generic,
> I'd prefer to:
>  1. make QEMU to be able generate DT with memory-less nodes

How? DT syntax doesn't allow this, because each node needs a unique
name which is derived from its base address, which an empty numa
node doesn't have.

>  2. fix guest to get numa-node-id from CPU/PCI nodes if
>     memory node isn't present,

I'm not sure that's possible with DT. If it is, then proposing it
upstream to Linux DT maintainers would be the next step.

> or use ACPI tables which can
>     describe memory-less NUMA nodes if fixing how DT is
>     parsed unfeasible.

We use ACPI already for our guests, but we also generate a DT (which
edk2 consumes). We can't generate a valid DT when empty numa nodes
are put on the command line unless we follow a DT spec saying how
to do that. The current spec says we should have a distance-map
that contains those nodes.

Thanks,
drew


Reply via email to