On 25.11.20 11:47, Andrew Jones wrote: > On Wed, Nov 25, 2020 at 09:45:19AM +0100, David Hildenbrand wrote: >> On 25.11.20 09:38, Andrew Jones wrote: >>> On Tue, Nov 24, 2020 at 08:17:35PM +0100, David Hildenbrand wrote: >>>> On 24.11.20 19:11, Jonathan Cameron wrote: >>>>> On Mon, 9 Nov 2020 20:47:09 +0100 >>>>> David Hildenbrand <da...@redhat.com> wrote: >>>>> >>>>> +CC Eric based on similar query in other branch of the thread. >>>>> >>>>>> On 05.11.20 18:43, Jonathan Cameron wrote: >>>>>>> Basically a cut and paste job from the x86 support with the exception of >>>>>>> needing a larger block size as the Memory Block Size (MIN_SECTION_SIZE) >>>>>>> on ARM64 in Linux is 1G. >>>>>>> >>>>>>> Tested: >>>>>>> * In full emulation and with KVM on an arm64 server. >>>>>>> * cold and hotplug for the virtio-mem-pci device. >>>>>>> * Wide range of memory sizes, added at creation and later. >>>>>>> * Fairly basic memory usage of memory added. Seems to function as >>>>>>> normal. >>>>>>> * NUMA setup with virtio-mem-pci devices on each node. >>>>>>> * Simple migration test. >>>>>>> >>>>>>> Related kernel patch just enables the Kconfig item for ARM64 as an >>>>>>> alternative to x86 in drivers/virtio/Kconfig >>>>>>> >>>>>>> The original patches from David Hildenbrand stated that he thought it >>>>>>> should >>>>>>> work for ARM64 but it wasn't enabled in the kernel [1] >>>>>>> It appears he was correct and everything 'just works'. >>>>>>> >>>>>>> The build system related stuff is intended to ensure virtio-mem support >>>>>>> is >>>>>>> not built for arm32 (build will fail due no defined block size). >>>>>>> If there is a more elegant way to do this, please point me in the right >>>>>>> direction. >>>>>> >>>>>> You might be aware of https://virtio-mem.gitlab.io/developer-guide.html >>>>>> and the "issue" with 64k base pages - 512MB granularity. Similar as the >>>>>> question from Auger, have you tried running arm64 with differing page >>>>>> sizes in host/guest? >>>>>> >>>>> >>>>> Hi David, >>>>> >>>>>> With recent kernels, you can use "memhp_default_state=online_movable" on >>>>>> the kernel cmdline to make memory unplug more likely to succeed - >>>>>> especially with 64k base pages. You just have to be sure to not hotplug >>>>>> "too much memory" to a VM. >>>>> >>>>> Thanks for the pointer - that definitely simplifies testing. Was getting >>>>> a bit >>>>> tedious with out that. >>>>> >>>>> As ever other stuff got in the way, so I only just got back to looking at >>>>> this. >>>>> >>>>> I've not done a particularly comprehensive set of tests yet, but things >>>>> seem >>>>> to 'work' with mixed page sizes. >>>>> >>>>> With 64K pages in general, you run into a problem with the device >>>>> block_size being >>>>> smaller than the subblock_size. I've just added a check for that into the >>>> >>>> "device block size smaller than subblock size" - that's very common, >>>> e.g., on x86-64. >>>> >>>> E.g., device_block_size is 2MiB, subblock size 4MiB - until we improve >>>> that in the future in Linux guests. >>>> >>>> Or did you mean something else? >>>> >>>>> virtio-mem kernel driver and have it fail to probe if that happens. I >>>>> don't >>>>> think such a setup makes any sense anyway so no loss there. Should it >>>>> make sense >>>>> to drop that restriction in the future we can deal with that then without >>>>> breaking >>>>> backwards compatibility. >>>>> >>>>> So the question is whether it makes sense to bother with virtio-mem >>>>> support >>>>> at all on ARM64 with 64k pages given currently the minimum workable >>>>> block_size >>>>> is 512MiB? I guess there is an argument of virtio-mem being a possibly >>>>> more >>>>> convenient interface than full memory HP. Curious to hear what people >>>>> think on >>>>> this? >>>> >>>> IMHO we really want it. For example, RHEL is always 64k. This is a >>>> current guest limitation, to be improved in the future - either by >>>> moving away from 512MB huge pages with 64k or by improving >>>> alloc_contig_range(). >>> >>> Even with 64k pages you may be able to have 2MB huge pages by setting >>> default_hugepagesz=2M on the kernel command line. >> >> Yes, but not for THP, right? Last time I checked that move was not >> performed yet - resulting in MAX_ORDER/pageblock_order in Linux >> corresponding to 512 MB. >> > > Yes, I believe you're correct. At least on the machine I've booted with > default_hugepagesz=2M, I see > > $ cat /sys/kernel/mm/transparent_hugepage/hpage_pmd_size > 536870912 > > (I'm not running a latest mainline kernel though.)
I remember some upstream discussions where people raised that switching to 2 MB THP might be possible (implemented via cont bits in the pagetables - similar to 2MB huge pages you mentioned). 512 MB really sounds more like gigantic pages after all. -- Thanks, David / dhildenb