On 09.07.20 12:52, Christian Borntraeger wrote: > > On 08.07.20 20:51, David Hildenbrand wrote: >> Let's implement the "storage configuration" part of diag260. This diag >> is found under z/VM, to indicate usable chunks of memory tot he guest OS. >> As I don't have access to documentation, I have no clue what the actual >> error cases are, and which other stuff we could eventually query using this >> interface. Somebody with access to documentation should fix this. This >> implementation seems to work with Linux guests just fine. >> >> The Linux kernel supports diag260 to query the available memory since >> v4.20. Older kernels / kvm-unit-tests will later fail to run in such a VM >> (with maxmem being defined and bigger than the memory size, e.g., "-m >> 2G,maxmem=4G"), just as if support for SCLP storage information is not >> implemented. They will fail to detect the actual initial memory size. >> >> This interface allows us to expose the maximum ramsize via sclp >> and the initial ramsize via diag260 - without having to mess with the >> memory increment size and having to align the initial memory size to it. >> >> This is a preparation for memory device support. We'll unlock the >> implementation with a new QEMU machine that supports memory devices. >> >> Signed-off-by: David Hildenbrand <da...@redhat.com> > > I have not looked into this, so this is purely a question. > > Is there a way to hotplug virtio-mem memory beyond the initial size of > the memory as specified by the initial sclp)? then we could avoid doing > this platform specfic diag260?
We need a way to tell the guest about the maximum possible PFN, so it can prepare for that. E.g. on x86-64 this is usually done via ACPI SRAT tables. On s390x, the only way I see is using a combination of diag260, without introducing any other new mechanisms. Currently Linux selects 3. vs 4 level page tables based on that size (I think that's what you were referring to with the 4TB limit). I can see that kasan also does some magic based on the value ("populate kasan shadow for untracked memory"), but did not look into the details. I *think* kasan will never be able to track that memory, but am not completely sure. I'd like to avoid something as you propose (that's why I searched and discovered diag260 after all :) ), especially to not silently break in the future, when other assumptions based on that value are introduced. E.g., on my z/VM LinuxOne Community Cloud machine, diag260 gets used as default, so it does not seem to be a corner case mechanism nowadays. > the only issue I see is when we need to go beyond 4TB due to the page table > upgrade in the kernel. > > FWIW diag 260 is publicly documented. Yeah, Conny pointed me at the doc - makes things easier :) -- Thanks, David / dhildenb