On 04/06/17 17:39 +0800, Xiao Guangrong wrote: > > > On 31/03/2017 4:41 PM, Haozhong Zhang wrote: > > This patch series constructs the flush hint address structures for > > nvdimm devices in QEMU. > > > > It's of course not for 2.9. I send it out early in order to get > > comments on one point I'm uncertain (see the detailed explanation > > below). Thanks for any comments in advance! > > > > > > Background > > --------------- > > Flush hint address structure is a substructure of NFIT and specifies > > one or more addresses, namely Flush Hint Addresses. Software can write > > to any one of these flush hint addresses to cause any preceding writes > > to the NVDIMM region to be flushed out of the intervening platform > > buffers to the targeted NVDIMM. More details can be found in ACPI Spec > > 6.1, Section 5.2.25.8 "Flush Hint Address Structure". > > > > > > Why is it RFC? > > --------------- > > RFC is added because I'm not sure whether the way in this patch series > > that allocates the guest flush hint addresses is right. > > > > QEMU needs to trap guest accesses (at least for writes) to the flush > > hint addresses in order to perform the necessary flush on the host > > back store. Therefore, QEMU needs to create IO memory regions that > > cover those flush hint addresses. In order to create those IO memory > > regions, QEMU needs to know the flush hint addresses or their offsets > > to other known memory regions in advance. So far looks good. > > > > Flush hint addresses are in the guest address space. Looking at how > > the current NVDIMM ACPI in QEMU allocates the DSM buffer, it's natural > > to take the same way for flush hint addresses, i.e. let the guest > > firmware allocate from free addresses and patch them in the flush hint > > address structure. (*Please correct me If my following understand is wrong*) > > However, the current allocation and pointer patching are transparent > > to QEMU, so QEMU will be unaware of the flush hint addresses, and > > consequently have no way to create corresponding IO memory regions in > > order to trap guest accesses. > > Er, it is awkward and flush-hint-table is static which may not be > easily patched. > > > > > Alternatively, this patch series moves the allocation of flush hint > > addresses to QEMU: > > > > 1. (Patch 1) We reserve an address range after the end address of each > > nvdimm device. Its size is specified by the user via a new pc-dimm > > option 'reserved-size'. > > > > We should make it only work for nvdimm? >
Yes, we can check whether the machine option 'nvdimm' is present when plugging the nvdimm. > > For the following example, > > -object memory-backend-file,id=mem0,size=4G,... > > -device nvdimm,id=dimm0,memdev=mem0,reserved-size=4K,... > > -device pc-dimm,id=dimm1,... > > if dimm0 is allocated to address N ~ N+4G, the address of dimm1 > > will start from N+4G+4K or higher. N+4G ~ N+4G+4K is reserved for > > dimm0. > > > > 2. (Patch 4) When NVDIMM ACPI code builds the flush hint address > > structure for each nvdimm device, it will allocate them from the > > above reserved area, e.g. the flush hint addresses of above dimm0 > > are allocated in N+4G ~ N+4G+4K. The addresses are known to QEMU in > > this way, so QEMU can easily create IO memory regions for them. > > > > If the reserved area is not present or too small, QEMU will report > > errors. > > > > We should make 'reserved-size' always be page-aligned and should be > transparent to the user, i.e, automatically reserve 4k if 'flush-hint' > is specified? > 4K alignment is already enforced by current memory plug code. About the automatic reservation, is a non-zero default value acceptable by qemu design/convention in general? Thanks, Haozhong