Re: [PATCH v3 00/32] implement vNVDIMM
On Sun, Oct 11, 2015 at 11:52:32AM +0800, Xiao Guangrong wrote: > Changelog in v3: > There is huge change in this version, thank Igor, Stefan, Paolo, Eduardo, > Michael for their valuable comments, the patchset finally gets better shape. Thanks! This needs some changes in coding style, and more comments, to make it easier to maintain going forward. High level comments - I didn't point out all instances, please go over code and locate them yourself. I focused on acpi code in this review. - fix coding style violations, prefix eveything with nvdimm_ etc - in apci code, avoid manual memory management/complex pointer math - comments are needed to document apis & explain what's going on - constants need comments too, refer to text that can be looked up in acpi spec verbatim > - changes from Igor's comments: > 1) abstract dimm device type from pc-dimm and create nvdimm device based on > dimm, then it uses memory backend device as nvdimm's memory and NUMA has > easily been implemented. > 2) let file-backend device support any kind of filesystem not only for > hugetlbfs and let it work on file not only for directory which is > achieved by extending 'mem-path' - if it's a directory then it works as > current behavior, otherwise if it's file then directly allocates memory > from it. > 3) we figure out a unused memory hole below 4G that is 0xFF0 ~ > 0xFFF0, this range is large enough for NVDIMM ACPI as build 64-bit > ACPI SSDT/DSDT table will break windows XP. > BTW, only make SSDT.rev = 2 can not work since the width is only depended > on DSDT.rev based on 19.6.28 DefinitionBlock (Declare Definition Block) > in ACPI spec: > | Note: For compatibility with ACPI versions before ACPI 2.0, the bit > | width of Integer objects is dependent on the ComplianceRevision of the DSDT. > | If the ComplianceRevision is less than 2, all integers are restricted to 32 > | bits. Otherwise, full 64-bit integers are used. The version of the DSDT > sets > | the global integer width for all integers, including integers in SSDTs. > 4) use the lowest ACPI spec version to document AML terms. > 5) use "nvdimm" as nvdimm device name instead of "pc-nvdimm" > > - changes from Stefan's comments: > 1) do not do endian adjustment in-place since _DSM memory is visible to > guest > 2) use target platform's target page size instead of fixed PAGE_SIZE > definition > 3) lots of code style improvement and typo fixes. > 4) live migration fix > - changes from Paolo's comments: > 1) improve the name of memory region > > - other changes: > 1) return exact buffer size for _DSM method instead of the page size. > 2) introduce mutex in NVDIMM ACPI as the _DSM memory is shared by all nvdimm > devices. > 3) NUMA support > 4) implement _FIT method > 5) rename "configdata" to "reserve-label-data" > 6) simplify _DSM arg3 determination > 7) main changelog update to let it reflect v3. > > Changlog in v2: > - Use litten endian for DSM method, thanks for Stefan's suggestion > > - introduce a new parameter, @configdata, if it's false, Qemu will > build a static and readonly namespace in memory and use it serveing > for DSM GET_CONFIG_SIZE/GET_CONFIG_DATA requests. In this case, no > reserved region is needed at the end of the @file, it is good for > the user who want to pass whole nvdimm device and make its data > completely be visible to guest > > - divide the source code into separated files and add maintain info > > BTW, PCOMMIT virtualization on KVM side is work in progress, hopefully will > be posted on next week > > == Background == > NVDIMM (A Non-Volatile Dual In-line Memory Module) is going to be supported > on Intel's platform. They are discovered via ACPI and configured by _DSM > method of NVDIMM device in ACPI. There has some supporting documents which > can be found at: > ACPI 6: http://www.uefi.org/sites/default/files/resources/ACPI_6.0.pdf > NVDIMM Namespace: http://pmem.io/documents/NVDIMM_Namespace_Spec.pdf > DSM Interface Example: > http://pmem.io/documents/NVDIMM_DSM_Interface_Example.pdf > Driver Writer's Guide: > http://pmem.io/documents/NVDIMM_Driver_Writers_Guide.pdf > > Currently, the NVDIMM driver has been merged into upstream Linux Kernel and > this patchset tries to enable it in virtualization field > > == Design == > NVDIMM supports two mode accesses, one is PMEM which maps NVDIMM into CPU's > address space then CPU can directly access it as normal memory, another is > BLK which is used as block device to reduce the occupying of CPU address > space > > BLK mode accesses NVDIMM via Command Register window and Data Register window. > BLK virtualization has high workload since each sector access will cause at > least two VM-EXIT. So we currently only imperilment vPMEM in this patchset > > --- vPMEM design --- > We introduce a new device named "nvdimm", it uses memory
Re: [PATCH v3 00/32] implement vNVDIMM
On Tue, Oct 13, 2015 at 01:29:48PM +0800, Xiao Guangrong wrote: > > > On 10/12/2015 07:55 PM, Michael S. Tsirkin wrote: > >On Sun, Oct 11, 2015 at 11:52:32AM +0800, Xiao Guangrong wrote: > >>Changelog in v3: > >>There is huge change in this version, thank Igor, Stefan, Paolo, Eduardo, > >>Michael for their valuable comments, the patchset finally gets better shape. > > > >Thanks! > >This needs some changes in coding style, and more comments, to > >make it easier to maintain going forward. > > Thanks for your review, Michael. I have learned lots of thing from > your comments. > > > > >High level comments - I didn't point out all instances, > >please go over code and locate them yourself. > >I focused on acpi code in this review. > > Okay, will do. > > > > > - fix coding style violations, prefix eveything with nvdimm_ etc > > Actually i did not pay attention on naming the stuff which is only internally > used. Thank you for pointing it out and will fix it in next version. > > > - in apci code, avoid manual memory management/complex pointer math > > I am not very good at ACPI ASL/AML, could you please more detail? It's about C. For example: Foo *foo = acpi_data_push(table, sizeof *foo); Bar *foo = acpi_data_push(table, sizeof *bar); is pretty obviously safe, and it doesn't require you to do any calculations. char *buf = acpi_data_push(table, sizeof *foo + sizeof *bar); is worse, now you need: Bar *bar = (Bar *)(buf + sizeof *foo); which will corrupt memory if you get the size wrong in push. > > - comments are needed to document apis & explain what's going on > > - constants need comments too, refer to text that > > can be looked up in acpi spec verbatim > > Indeed, will document carefully. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH v3 00/32] implement vNVDIMM
On Tue, Oct 13, 2015 at 9:03 PM, Xiao Guangrongwrote: >> Label-less DIMMs are tested as part of the unit test [1] and the >> "memmap=nn!ss" kernel parameter that registers a persistent-memory >> address range without a DIMM. What error do you see when label >> support is disabled? >> >> [1]: https://github.com/pmem/ndctl/blob/master/README.md >> > > After revert my commits on NVDIMM driver, yeah, it works. > > Okay, i will drop the namespace part and make it as label-less > instead. > > Thank you, Dan! > Good to hear. There are still cases where a guest would likely want to submit a _DSM, like retrieving address range scrub results from the ACPI0012 root device, so the ASL work is still needed. However, I think the bulk of the storage functionality can be had without storing/retrieving labels in the guest. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH v3 00/32] implement vNVDIMM
On 10/13/2015 02:36 PM, Dan Williams wrote: On Mon, Oct 12, 2015 at 10:49 PM, Xiao Guangrongwrote: On 10/13/2015 11:38 AM, Dan Williams wrote: On Mon, Oct 12, 2015 at 8:14 PM, Xiao Guangrong wrote: On 10/13/2015 12:36 AM, Dan Williams wrote: Static namespaces can be emitted without a label. Linux needs this to support existing "label-less" bare metal NVDIMMs. This is Linux specific? As i did not see it has been documented in the spec... I expect most NVDIMMs, especially existing ones available today, do not have a label area. This is not Linux specific and ACPI 6 does not specify a label area, only the Intel DSM Interface Example. Yup, label data is accessed via DSM interface, the spec I mentioned is Intel DSM Interface Example. However, IIRC Linux NVDIMM driver refused to use the device if no DSM GET_LABEL support, are you going to update it? Label-less DIMMs are tested as part of the unit test [1] and the "memmap=nn!ss" kernel parameter that registers a persistent-memory address range without a DIMM. What error do you see when label support is disabled? [1]: https://github.com/pmem/ndctl/blob/master/README.md After revert my commits on NVDIMM driver, yeah, it works. Okay, i will drop the namespace part and make it as label-less instead. Thank you, Dan! -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH v3 00/32] implement vNVDIMM
On Mon, Oct 12, 2015 at 10:49 PM, Xiao Guangrongwrote: > > > On 10/13/2015 11:38 AM, Dan Williams wrote: >> >> On Mon, Oct 12, 2015 at 8:14 PM, Xiao Guangrong >> wrote: >>> >>> On 10/13/2015 12:36 AM, Dan Williams wrote: Static namespaces can be emitted without a label. Linux needs this to support existing "label-less" bare metal NVDIMMs. >>> >>> >>> >>> This is Linux specific? As i did not see it has been documented in the >>> spec... >> >> >> I expect most NVDIMMs, especially existing ones available today, do >> not have a label area. This is not Linux specific and ACPI 6 does not >> specify a label area, only the Intel DSM Interface Example. >> > > Yup, label data is accessed via DSM interface, the spec I mentioned > is Intel DSM Interface Example. > > However, IIRC Linux NVDIMM driver refused to use the device if no > DSM GET_LABEL support, are you going to update it? Label-less DIMMs are tested as part of the unit test [1] and the "memmap=nn!ss" kernel parameter that registers a persistent-memory address range without a DIMM. What error do you see when label support is disabled? [1]: https://github.com/pmem/ndctl/blob/master/README.md -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 00/32] implement vNVDIMM
On Mon, 12 Oct 2015 11:06:20 +0800 Xiao Guangrongwrote: > > > On 10/12/2015 10:59 AM, Bharata B Rao wrote: > > Xiao, > > > > Are these patches present in any git tree so that they can be easily tried > > out. > > > > Sorry, currently no git tree out of my workspace is available :( Is it possible for you to put working tree on github? > > BTW, this patchset is based on top of the commit b37686f7e on qemu tree: > commit b37686f7e84b22cfaf7fd01ac5133f2617cc3027 > Merge: 8be6e62 98cf48f > Author: Peter Maydell > Date: Fri Oct 9 12:18:13 2015 +0100 > > Merge remote-tracking branch > 'remotes/stefanha/tags/tracing-pull-request' into staging > > Thanks. > > > Regards, > > Bharata. > > > > On Sun, Oct 11, 2015 at 9:22 AM, Xiao Guangrong > > wrote: > >> Changelog in v3: > >> There is huge change in this version, thank Igor, Stefan, Paolo, Eduardo, > >> Michael for their valuable comments, the patchset finally gets better > >> shape. > >> - changes from Igor's comments: > >>1) abstract dimm device type from pc-dimm and create nvdimm device > >> based on > >> dimm, then it uses memory backend device as nvdimm's memory and NUMA > >> has > >> easily been implemented. > >>2) let file-backend device support any kind of filesystem not only for > >> hugetlbfs and let it work on file not only for directory which is > >> achieved by extending 'mem-path' - if it's a directory then it works > >> as > >> current behavior, otherwise if it's file then directly allocates > >> memory > >> from it. > >>3) we figure out a unused memory hole below 4G that is 0xFF0 ~ > >> 0xFFF0, this range is large enough for NVDIMM ACPI as build > >> 64-bit > >> ACPI SSDT/DSDT table will break windows XP. > >> BTW, only make SSDT.rev = 2 can not work since the width is only > >> depended > >> on DSDT.rev based on 19.6.28 DefinitionBlock (Declare Definition > >> Block) > >> in ACPI spec: > >> | Note: For compatibility with ACPI versions before ACPI 2.0, the bit > >> | width of Integer objects is dependent on the ComplianceRevision of the > >> DSDT. > >> | If the ComplianceRevision is less than 2, all integers are restricted to > >> 32 > >> | bits. Otherwise, full 64-bit integers are used. The version of the DSDT > >> sets > >> | the global integer width for all integers, including integers in SSDTs. > >>4) use the lowest ACPI spec version to document AML terms. > >>5) use "nvdimm" as nvdimm device name instead of "pc-nvdimm" > >> > >> - changes from Stefan's comments: > >>1) do not do endian adjustment in-place since _DSM memory is visible to > >> guest > >>2) use target platform's target page size instead of fixed PAGE_SIZE > >> definition > >>3) lots of code style improvement and typo fixes. > >>4) live migration fix > >> - changes from Paolo's comments: > >>1) improve the name of memory region > >> > >> - other changes: > >>1) return exact buffer size for _DSM method instead of the page size. > >>2) introduce mutex in NVDIMM ACPI as the _DSM memory is shared by all > >> nvdimm > >> devices. > >>3) NUMA support > >>4) implement _FIT method > >>5) rename "configdata" to "reserve-label-data" > >>6) simplify _DSM arg3 determination > >>7) main changelog update to let it reflect v3. > >> > >> Changlog in v2: > >> - Use litten endian for DSM method, thanks for Stefan's suggestion > >> > >> - introduce a new parameter, @configdata, if it's false, Qemu will > >>build a static and readonly namespace in memory and use it serveing > >>for DSM GET_CONFIG_SIZE/GET_CONFIG_DATA requests. In this case, no > >>reserved region is needed at the end of the @file, it is good for > >>the user who want to pass whole nvdimm device and make its data > >>completely be visible to guest > >> > >> - divide the source code into separated files and add maintain info > >> > >> BTW, PCOMMIT virtualization on KVM side is work in progress, hopefully will > >> be posted on next week > >> > >> == Background == > >> NVDIMM (A Non-Volatile Dual In-line Memory Module) is going to be supported > >> on Intel's platform. They are discovered via ACPI and configured by _DSM > >> method of NVDIMM device in ACPI. There has some supporting documents which > >> can be found at: > >> ACPI 6: http://www.uefi.org/sites/default/files/resources/ACPI_6.0.pdf > >> NVDIMM Namespace: http://pmem.io/documents/NVDIMM_Namespace_Spec.pdf > >> DSM Interface Example: > >> http://pmem.io/documents/NVDIMM_DSM_Interface_Example.pdf > >> Driver Writer's Guide: > >> http://pmem.io/documents/NVDIMM_Driver_Writers_Guide.pdf > >> > >> Currently, the NVDIMM driver has been merged into upstream Linux Kernel and > >> this patchset tries to enable it in virtualization field > >> > >> == Design == > >> NVDIMM
Re: [PATCH v3 00/32] implement vNVDIMM
On 10/12/2015 04:20 PM, Igor Mammedov wrote: On Mon, 12 Oct 2015 11:06:20 +0800 Xiao Guangrongwrote: On 10/12/2015 10:59 AM, Bharata B Rao wrote: Xiao, Are these patches present in any git tree so that they can be easily tried out. Sorry, currently no git tree out of my workspace is available :( Is it possible for you to put working tree on github? Yes. I am planning do this. Hopefully, i will publish the git repo alone with the new version patchset. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 00/32] implement vNVDIMM
On Sun, Oct 11, 2015 at 9:33 PM, Xiao Guangrongwrote: > > > On 10/11/2015 05:17 AM, Dan Williams wrote: >> >> On Sat, Oct 10, 2015 at 8:52 PM, Xiao Guangrong >> wrote: >> [..] >>> >>> == Test == >>> In host >>> 1) create memory backed file, e.g # dd if=zero of=/tmp/nvdimm bs=1G >>> count=10 >>> 2) append "-object memory-backend-file,share,id=mem1, >>> mem-path=/tmp/nvdimm -device nvdimm,memdev=mem1,reserve-label-data, >>> id=nv1" in QEMU command line >>> >>> In guest, download the latest upsteam kernel (4.2 merge window) and >>> enable >>> ACPI_NFIT, LIBNVDIMM and BLK_DEV_PMEM. >>> 1) insmod drivers/nvdimm/libnvdimm.ko >>> 2) insmod drivers/acpi/nfit.ko >>> 3) insmod drivers/nvdimm/nd_btt.ko >>> 4) insmod drivers/nvdimm/nd_pmem.ko >>> You can see the whole nvdimm device used as a single namespace and >>> /dev/pmem0 >>> appears. You can do whatever on /dev/pmem0 including DAX access. >>> >>> Currently Linux NVDIMM driver does not support namespace operation on >>> this >>> kind of PMEM, apply below changes to support dynamical namespace: >>> >>> @@ -798,7 +823,8 @@ static int acpi_nfit_register_dimms(struct >>> acpi_nfit_desc *a >>> continue; >>> } >>> >>> - if (nfit_mem->bdw && nfit_mem->memdev_pmem) >>> + //if (nfit_mem->bdw && nfit_mem->memdev_pmem) >>> + if (nfit_mem->memdev_pmem) >>> flags |= NDD_ALIASING; >> >> >> This is just for testing purposes, right? I expect guests can > > > It's used to validate NVDIMM _DSM method and static namespace following > NVDIMM specs... Static namespaces can be emitted without a label. Linux needs this to support existing "label-less" bare metal NVDIMMs. >> sub-divide persistent memory capacity by partitioning the resulting >> block device(s). > > > I understand that it's a Linux design... Hmm, can the same expectation > apply to PBLK? BLK-mode is a bit different as those namespaces have both configurable sector-size and an optional BTT. It is possible to expect multiple BLK namespaces per a given region with different settings. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 00/32] implement vNVDIMM
On Sun, Oct 11, 2015 at 11:52:32AM +0800, Xiao Guangrong wrote: > Changelog in v3: > There is huge change in this version, thank Igor, Stefan, Paolo, Eduardo, > Michael for their valuable comments, the patchset finally gets better shape. Thanks! This needs some changes in coding style, and more comments, to make it easier to maintain going forward. High level comments - I didn't point out all instances, please go over code and locate them yourself. I focused on acpi code in this review. - fix coding style violations, prefix eveything with nvdimm_ etc - in apci code, avoid manual memory management/complex pointer math - comments are needed to document apis & explain what's going on - constants need comments too, refer to text that can be looked up in acpi spec verbatim > - changes from Igor's comments: > 1) abstract dimm device type from pc-dimm and create nvdimm device based on > dimm, then it uses memory backend device as nvdimm's memory and NUMA has > easily been implemented. > 2) let file-backend device support any kind of filesystem not only for > hugetlbfs and let it work on file not only for directory which is > achieved by extending 'mem-path' - if it's a directory then it works as > current behavior, otherwise if it's file then directly allocates memory > from it. > 3) we figure out a unused memory hole below 4G that is 0xFF0 ~ > 0xFFF0, this range is large enough for NVDIMM ACPI as build 64-bit > ACPI SSDT/DSDT table will break windows XP. > BTW, only make SSDT.rev = 2 can not work since the width is only depended > on DSDT.rev based on 19.6.28 DefinitionBlock (Declare Definition Block) > in ACPI spec: > | Note: For compatibility with ACPI versions before ACPI 2.0, the bit > | width of Integer objects is dependent on the ComplianceRevision of the DSDT. > | If the ComplianceRevision is less than 2, all integers are restricted to 32 > | bits. Otherwise, full 64-bit integers are used. The version of the DSDT > sets > | the global integer width for all integers, including integers in SSDTs. > 4) use the lowest ACPI spec version to document AML terms. > 5) use "nvdimm" as nvdimm device name instead of "pc-nvdimm" > > - changes from Stefan's comments: > 1) do not do endian adjustment in-place since _DSM memory is visible to > guest > 2) use target platform's target page size instead of fixed PAGE_SIZE > definition > 3) lots of code style improvement and typo fixes. > 4) live migration fix > - changes from Paolo's comments: > 1) improve the name of memory region > > - other changes: > 1) return exact buffer size for _DSM method instead of the page size. > 2) introduce mutex in NVDIMM ACPI as the _DSM memory is shared by all nvdimm > devices. > 3) NUMA support > 4) implement _FIT method > 5) rename "configdata" to "reserve-label-data" > 6) simplify _DSM arg3 determination > 7) main changelog update to let it reflect v3. > > Changlog in v2: > - Use litten endian for DSM method, thanks for Stefan's suggestion > > - introduce a new parameter, @configdata, if it's false, Qemu will > build a static and readonly namespace in memory and use it serveing > for DSM GET_CONFIG_SIZE/GET_CONFIG_DATA requests. In this case, no > reserved region is needed at the end of the @file, it is good for > the user who want to pass whole nvdimm device and make its data > completely be visible to guest > > - divide the source code into separated files and add maintain info > > BTW, PCOMMIT virtualization on KVM side is work in progress, hopefully will > be posted on next week > > == Background == > NVDIMM (A Non-Volatile Dual In-line Memory Module) is going to be supported > on Intel's platform. They are discovered via ACPI and configured by _DSM > method of NVDIMM device in ACPI. There has some supporting documents which > can be found at: > ACPI 6: http://www.uefi.org/sites/default/files/resources/ACPI_6.0.pdf > NVDIMM Namespace: http://pmem.io/documents/NVDIMM_Namespace_Spec.pdf > DSM Interface Example: > http://pmem.io/documents/NVDIMM_DSM_Interface_Example.pdf > Driver Writer's Guide: > http://pmem.io/documents/NVDIMM_Driver_Writers_Guide.pdf > > Currently, the NVDIMM driver has been merged into upstream Linux Kernel and > this patchset tries to enable it in virtualization field > > == Design == > NVDIMM supports two mode accesses, one is PMEM which maps NVDIMM into CPU's > address space then CPU can directly access it as normal memory, another is > BLK which is used as block device to reduce the occupying of CPU address > space > > BLK mode accesses NVDIMM via Command Register window and Data Register window. > BLK virtualization has high workload since each sector access will cause at > least two VM-EXIT. So we currently only imperilment vPMEM in this patchset > > --- vPMEM design --- > We introduce a new device named "nvdimm", it uses memory
Re: [Qemu-devel] [PATCH v3 00/32] implement vNVDIMM
On 10/13/2015 11:38 AM, Dan Williams wrote: On Mon, Oct 12, 2015 at 8:14 PM, Xiao Guangrongwrote: On 10/13/2015 12:36 AM, Dan Williams wrote: Static namespaces can be emitted without a label. Linux needs this to support existing "label-less" bare metal NVDIMMs. This is Linux specific? As i did not see it has been documented in the spec... I expect most NVDIMMs, especially existing ones available today, do not have a label area. This is not Linux specific and ACPI 6 does not specify a label area, only the Intel DSM Interface Example. Yup, label data is accessed via DSM interface, the spec I mentioned is Intel DSM Interface Example. However, IIRC Linux NVDIMM driver refused to use the device if no DSM GET_LABEL support, are you going to update it? -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH v3 00/32] implement vNVDIMM
On 10/13/2015 12:36 AM, Dan Williams wrote: On Sun, Oct 11, 2015 at 9:33 PM, Xiao Guangrongwrote: On 10/11/2015 05:17 AM, Dan Williams wrote: On Sat, Oct 10, 2015 at 8:52 PM, Xiao Guangrong wrote: [..] == Test == In host 1) create memory backed file, e.g # dd if=zero of=/tmp/nvdimm bs=1G count=10 2) append "-object memory-backend-file,share,id=mem1, mem-path=/tmp/nvdimm -device nvdimm,memdev=mem1,reserve-label-data, id=nv1" in QEMU command line In guest, download the latest upsteam kernel (4.2 merge window) and enable ACPI_NFIT, LIBNVDIMM and BLK_DEV_PMEM. 1) insmod drivers/nvdimm/libnvdimm.ko 2) insmod drivers/acpi/nfit.ko 3) insmod drivers/nvdimm/nd_btt.ko 4) insmod drivers/nvdimm/nd_pmem.ko You can see the whole nvdimm device used as a single namespace and /dev/pmem0 appears. You can do whatever on /dev/pmem0 including DAX access. Currently Linux NVDIMM driver does not support namespace operation on this kind of PMEM, apply below changes to support dynamical namespace: @@ -798,7 +823,8 @@ static int acpi_nfit_register_dimms(struct acpi_nfit_desc *a continue; } - if (nfit_mem->bdw && nfit_mem->memdev_pmem) + //if (nfit_mem->bdw && nfit_mem->memdev_pmem) + if (nfit_mem->memdev_pmem) flags |= NDD_ALIASING; This is just for testing purposes, right? I expect guests can It's used to validate NVDIMM _DSM method and static namespace following NVDIMM specs... Static namespaces can be emitted without a label. Linux needs this to support existing "label-less" bare metal NVDIMMs. This is Linux specific? As i did not see it has been documented in the spec... sub-divide persistent memory capacity by partitioning the resulting block device(s). I understand that it's a Linux design... Hmm, can the same expectation apply to PBLK? BLK-mode is a bit different as those namespaces have both configurable sector-size and an optional BTT. It is possible to expect multiple BLK namespaces per a given region with different settings. Okay, thanks for your nice explanation, Dan! -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 00/32] implement vNVDIMM
On 10/12/2015 07:55 PM, Michael S. Tsirkin wrote: On Sun, Oct 11, 2015 at 11:52:32AM +0800, Xiao Guangrong wrote: Changelog in v3: There is huge change in this version, thank Igor, Stefan, Paolo, Eduardo, Michael for their valuable comments, the patchset finally gets better shape. Thanks! This needs some changes in coding style, and more comments, to make it easier to maintain going forward. Thanks for your review, Michael. I have learned lots of thing from your comments. High level comments - I didn't point out all instances, please go over code and locate them yourself. I focused on acpi code in this review. Okay, will do. - fix coding style violations, prefix eveything with nvdimm_ etc Actually i did not pay attention on naming the stuff which is only internally used. Thank you for pointing it out and will fix it in next version. - in apci code, avoid manual memory management/complex pointer math I am not very good at ACPI ASL/AML, could you please more detail? - comments are needed to document apis & explain what's going on - constants need comments too, refer to text that can be looked up in acpi spec verbatim Indeed, will document carefully. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 00/32] implement vNVDIMM
On Tue, Oct 13, 2015 at 01:29:48PM +0800, Xiao Guangrong wrote: > > > On 10/12/2015 07:55 PM, Michael S. Tsirkin wrote: > >On Sun, Oct 11, 2015 at 11:52:32AM +0800, Xiao Guangrong wrote: > >>Changelog in v3: > >>There is huge change in this version, thank Igor, Stefan, Paolo, Eduardo, > >>Michael for their valuable comments, the patchset finally gets better shape. > > > >Thanks! > >This needs some changes in coding style, and more comments, to > >make it easier to maintain going forward. > > Thanks for your review, Michael. I have learned lots of thing from > your comments. > > > > >High level comments - I didn't point out all instances, > >please go over code and locate them yourself. > >I focused on acpi code in this review. > > Okay, will do. > > > > > - fix coding style violations, prefix eveything with nvdimm_ etc > > Actually i did not pay attention on naming the stuff which is only internally > used. Thank you for pointing it out and will fix it in next version. > > > - in apci code, avoid manual memory management/complex pointer math > > I am not very good at ACPI ASL/AML, could you please more detail? It's about C. For example: Foo *foo = acpi_data_push(table, sizeof *foo); Bar *foo = acpi_data_push(table, sizeof *bar); is pretty obviously safe, and it doesn't require you to do any calculations. char *buf = acpi_data_push(table, sizeof *foo + sizeof *bar); is worse, now you need: Bar *bar = (Bar *)(buf + sizeof *foo); which will corrupt memory if you get the size wrong in push. > > - comments are needed to document apis & explain what's going on > > - constants need comments too, refer to text that > > can be looked up in acpi spec verbatim > > Indeed, will document carefully. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 00/32] implement vNVDIMM
On 10/13/2015 01:57 PM, Michael S. Tsirkin wrote: On Tue, Oct 13, 2015 at 01:29:48PM +0800, Xiao Guangrong wrote: On 10/12/2015 07:55 PM, Michael S. Tsirkin wrote: On Sun, Oct 11, 2015 at 11:52:32AM +0800, Xiao Guangrong wrote: Changelog in v3: There is huge change in this version, thank Igor, Stefan, Paolo, Eduardo, Michael for their valuable comments, the patchset finally gets better shape. Thanks! This needs some changes in coding style, and more comments, to make it easier to maintain going forward. Thanks for your review, Michael. I have learned lots of thing from your comments. High level comments - I didn't point out all instances, please go over code and locate them yourself. I focused on acpi code in this review. Okay, will do. - fix coding style violations, prefix eveything with nvdimm_ etc Actually i did not pay attention on naming the stuff which is only internally used. Thank you for pointing it out and will fix it in next version. - in apci code, avoid manual memory management/complex pointer math I am not very good at ACPI ASL/AML, could you please more detail? It's about C. For example: Foo *foo = acpi_data_push(table, sizeof *foo); Bar *foo = acpi_data_push(table, sizeof *bar); is pretty obviously safe, and it doesn't require you to do any calculations. char *buf = acpi_data_push(table, sizeof *foo + sizeof *bar); is worse, now you need: Bar *bar = (Bar *)(buf + sizeof *foo); which will corrupt memory if you get the size wrong in push. Ah, got it. :) -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH v3 00/32] implement vNVDIMM
On Mon, Oct 12, 2015 at 8:14 PM, Xiao Guangrongwrote: > On 10/13/2015 12:36 AM, Dan Williams wrote: >> Static namespaces can be emitted without a label. Linux needs this to >> support existing "label-less" bare metal NVDIMMs. > > > This is Linux specific? As i did not see it has been documented in the > spec... I expect most NVDIMMs, especially existing ones available today, do not have a label area. This is not Linux specific and ACPI 6 does not specify a label area, only the Intel DSM Interface Example. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 00/32] implement vNVDIMM
Xiao, Are these patches present in any git tree so that they can be easily tried out. Regards, Bharata. On Sun, Oct 11, 2015 at 9:22 AM, Xiao Guangrongwrote: > Changelog in v3: > There is huge change in this version, thank Igor, Stefan, Paolo, Eduardo, > Michael for their valuable comments, the patchset finally gets better shape. > - changes from Igor's comments: > 1) abstract dimm device type from pc-dimm and create nvdimm device based on > dimm, then it uses memory backend device as nvdimm's memory and NUMA has > easily been implemented. > 2) let file-backend device support any kind of filesystem not only for > hugetlbfs and let it work on file not only for directory which is > achieved by extending 'mem-path' - if it's a directory then it works as > current behavior, otherwise if it's file then directly allocates memory > from it. > 3) we figure out a unused memory hole below 4G that is 0xFF0 ~ > 0xFFF0, this range is large enough for NVDIMM ACPI as build 64-bit > ACPI SSDT/DSDT table will break windows XP. > BTW, only make SSDT.rev = 2 can not work since the width is only depended > on DSDT.rev based on 19.6.28 DefinitionBlock (Declare Definition Block) > in ACPI spec: > | Note: For compatibility with ACPI versions before ACPI 2.0, the bit > | width of Integer objects is dependent on the ComplianceRevision of the DSDT. > | If the ComplianceRevision is less than 2, all integers are restricted to 32 > | bits. Otherwise, full 64-bit integers are used. The version of the DSDT sets > | the global integer width for all integers, including integers in SSDTs. > 4) use the lowest ACPI spec version to document AML terms. > 5) use "nvdimm" as nvdimm device name instead of "pc-nvdimm" > > - changes from Stefan's comments: > 1) do not do endian adjustment in-place since _DSM memory is visible to > guest > 2) use target platform's target page size instead of fixed PAGE_SIZE > definition > 3) lots of code style improvement and typo fixes. > 4) live migration fix > - changes from Paolo's comments: > 1) improve the name of memory region > > - other changes: > 1) return exact buffer size for _DSM method instead of the page size. > 2) introduce mutex in NVDIMM ACPI as the _DSM memory is shared by all nvdimm > devices. > 3) NUMA support > 4) implement _FIT method > 5) rename "configdata" to "reserve-label-data" > 6) simplify _DSM arg3 determination > 7) main changelog update to let it reflect v3. > > Changlog in v2: > - Use litten endian for DSM method, thanks for Stefan's suggestion > > - introduce a new parameter, @configdata, if it's false, Qemu will > build a static and readonly namespace in memory and use it serveing > for DSM GET_CONFIG_SIZE/GET_CONFIG_DATA requests. In this case, no > reserved region is needed at the end of the @file, it is good for > the user who want to pass whole nvdimm device and make its data > completely be visible to guest > > - divide the source code into separated files and add maintain info > > BTW, PCOMMIT virtualization on KVM side is work in progress, hopefully will > be posted on next week > > == Background == > NVDIMM (A Non-Volatile Dual In-line Memory Module) is going to be supported > on Intel's platform. They are discovered via ACPI and configured by _DSM > method of NVDIMM device in ACPI. There has some supporting documents which > can be found at: > ACPI 6: http://www.uefi.org/sites/default/files/resources/ACPI_6.0.pdf > NVDIMM Namespace: http://pmem.io/documents/NVDIMM_Namespace_Spec.pdf > DSM Interface Example: > http://pmem.io/documents/NVDIMM_DSM_Interface_Example.pdf > Driver Writer's Guide: > http://pmem.io/documents/NVDIMM_Driver_Writers_Guide.pdf > > Currently, the NVDIMM driver has been merged into upstream Linux Kernel and > this patchset tries to enable it in virtualization field > > == Design == > NVDIMM supports two mode accesses, one is PMEM which maps NVDIMM into CPU's > address space then CPU can directly access it as normal memory, another is > BLK which is used as block device to reduce the occupying of CPU address > space > > BLK mode accesses NVDIMM via Command Register window and Data Register window. > BLK virtualization has high workload since each sector access will cause at > least two VM-EXIT. So we currently only imperilment vPMEM in this patchset > > --- vPMEM design --- > We introduce a new device named "nvdimm", it uses memory backend device as > NVDIMM memory. The file in file-backend device can be a regular file and block > device. We can use any file when we do test or emulation, however, > in the real word, the files passed to guest are: > - the regular file in the filesystem with DAX enabled created on NVDIMM device > on host > - the raw PMEM device on host, e,g /dev/pmem0 > Memory access on the address created by mmap on these kinds of files can > directly reach NVDIMM
Re: [PATCH v3 00/32] implement vNVDIMM
On 10/12/2015 10:59 AM, Bharata B Rao wrote: Xiao, Are these patches present in any git tree so that they can be easily tried out. Sorry, currently no git tree out of my workspace is available :( BTW, this patchset is based on top of the commit b37686f7e on qemu tree: commit b37686f7e84b22cfaf7fd01ac5133f2617cc3027 Merge: 8be6e62 98cf48f Author: Peter MaydellDate: Fri Oct 9 12:18:13 2015 +0100 Merge remote-tracking branch 'remotes/stefanha/tags/tracing-pull-request' into staging Thanks. Regards, Bharata. On Sun, Oct 11, 2015 at 9:22 AM, Xiao Guangrong wrote: Changelog in v3: There is huge change in this version, thank Igor, Stefan, Paolo, Eduardo, Michael for their valuable comments, the patchset finally gets better shape. - changes from Igor's comments: 1) abstract dimm device type from pc-dimm and create nvdimm device based on dimm, then it uses memory backend device as nvdimm's memory and NUMA has easily been implemented. 2) let file-backend device support any kind of filesystem not only for hugetlbfs and let it work on file not only for directory which is achieved by extending 'mem-path' - if it's a directory then it works as current behavior, otherwise if it's file then directly allocates memory from it. 3) we figure out a unused memory hole below 4G that is 0xFF0 ~ 0xFFF0, this range is large enough for NVDIMM ACPI as build 64-bit ACPI SSDT/DSDT table will break windows XP. BTW, only make SSDT.rev = 2 can not work since the width is only depended on DSDT.rev based on 19.6.28 DefinitionBlock (Declare Definition Block) in ACPI spec: | Note: For compatibility with ACPI versions before ACPI 2.0, the bit | width of Integer objects is dependent on the ComplianceRevision of the DSDT. | If the ComplianceRevision is less than 2, all integers are restricted to 32 | bits. Otherwise, full 64-bit integers are used. The version of the DSDT sets | the global integer width for all integers, including integers in SSDTs. 4) use the lowest ACPI spec version to document AML terms. 5) use "nvdimm" as nvdimm device name instead of "pc-nvdimm" - changes from Stefan's comments: 1) do not do endian adjustment in-place since _DSM memory is visible to guest 2) use target platform's target page size instead of fixed PAGE_SIZE definition 3) lots of code style improvement and typo fixes. 4) live migration fix - changes from Paolo's comments: 1) improve the name of memory region - other changes: 1) return exact buffer size for _DSM method instead of the page size. 2) introduce mutex in NVDIMM ACPI as the _DSM memory is shared by all nvdimm devices. 3) NUMA support 4) implement _FIT method 5) rename "configdata" to "reserve-label-data" 6) simplify _DSM arg3 determination 7) main changelog update to let it reflect v3. Changlog in v2: - Use litten endian for DSM method, thanks for Stefan's suggestion - introduce a new parameter, @configdata, if it's false, Qemu will build a static and readonly namespace in memory and use it serveing for DSM GET_CONFIG_SIZE/GET_CONFIG_DATA requests. In this case, no reserved region is needed at the end of the @file, it is good for the user who want to pass whole nvdimm device and make its data completely be visible to guest - divide the source code into separated files and add maintain info BTW, PCOMMIT virtualization on KVM side is work in progress, hopefully will be posted on next week == Background == NVDIMM (A Non-Volatile Dual In-line Memory Module) is going to be supported on Intel's platform. They are discovered via ACPI and configured by _DSM method of NVDIMM device in ACPI. There has some supporting documents which can be found at: ACPI 6: http://www.uefi.org/sites/default/files/resources/ACPI_6.0.pdf NVDIMM Namespace: http://pmem.io/documents/NVDIMM_Namespace_Spec.pdf DSM Interface Example: http://pmem.io/documents/NVDIMM_DSM_Interface_Example.pdf Driver Writer's Guide: http://pmem.io/documents/NVDIMM_Driver_Writers_Guide.pdf Currently, the NVDIMM driver has been merged into upstream Linux Kernel and this patchset tries to enable it in virtualization field == Design == NVDIMM supports two mode accesses, one is PMEM which maps NVDIMM into CPU's address space then CPU can directly access it as normal memory, another is BLK which is used as block device to reduce the occupying of CPU address space BLK mode accesses NVDIMM via Command Register window and Data Register window. BLK virtualization has high workload since each sector access will cause at least two VM-EXIT. So we currently only imperilment vPMEM in this patchset --- vPMEM design --- We introduce a new device named "nvdimm", it uses memory backend device as NVDIMM memory. The file in file-backend device can be a regular file and block device. We can use any file when we do
Re: [PATCH v3 00/32] implement vNVDIMM
On 10/11/2015 05:17 AM, Dan Williams wrote: On Sat, Oct 10, 2015 at 8:52 PM, Xiao Guangrongwrote: [..] == Test == In host 1) create memory backed file, e.g # dd if=zero of=/tmp/nvdimm bs=1G count=10 2) append "-object memory-backend-file,share,id=mem1, mem-path=/tmp/nvdimm -device nvdimm,memdev=mem1,reserve-label-data, id=nv1" in QEMU command line In guest, download the latest upsteam kernel (4.2 merge window) and enable ACPI_NFIT, LIBNVDIMM and BLK_DEV_PMEM. 1) insmod drivers/nvdimm/libnvdimm.ko 2) insmod drivers/acpi/nfit.ko 3) insmod drivers/nvdimm/nd_btt.ko 4) insmod drivers/nvdimm/nd_pmem.ko You can see the whole nvdimm device used as a single namespace and /dev/pmem0 appears. You can do whatever on /dev/pmem0 including DAX access. Currently Linux NVDIMM driver does not support namespace operation on this kind of PMEM, apply below changes to support dynamical namespace: @@ -798,7 +823,8 @@ static int acpi_nfit_register_dimms(struct acpi_nfit_desc *a continue; } - if (nfit_mem->bdw && nfit_mem->memdev_pmem) + //if (nfit_mem->bdw && nfit_mem->memdev_pmem) + if (nfit_mem->memdev_pmem) flags |= NDD_ALIASING; This is just for testing purposes, right? I expect guests can It's used to validate NVDIMM _DSM method and static namespace following NVDIMM specs... sub-divide persistent memory capacity by partitioning the resulting block device(s). I understand that it's a Linux design... Hmm, can the same expectation apply to PBLK? -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 00/32] implement vNVDIMM
Changelog in v3: There is huge change in this version, thank Igor, Stefan, Paolo, Eduardo, Michael for their valuable comments, the patchset finally gets better shape. - changes from Igor's comments: 1) abstract dimm device type from pc-dimm and create nvdimm device based on dimm, then it uses memory backend device as nvdimm's memory and NUMA has easily been implemented. 2) let file-backend device support any kind of filesystem not only for hugetlbfs and let it work on file not only for directory which is achieved by extending 'mem-path' - if it's a directory then it works as current behavior, otherwise if it's file then directly allocates memory from it. 3) we figure out a unused memory hole below 4G that is 0xFF0 ~ 0xFFF0, this range is large enough for NVDIMM ACPI as build 64-bit ACPI SSDT/DSDT table will break windows XP. BTW, only make SSDT.rev = 2 can not work since the width is only depended on DSDT.rev based on 19.6.28 DefinitionBlock (Declare Definition Block) in ACPI spec: | Note: For compatibility with ACPI versions before ACPI 2.0, the bit | width of Integer objects is dependent on the ComplianceRevision of the DSDT. | If the ComplianceRevision is less than 2, all integers are restricted to 32 | bits. Otherwise, full 64-bit integers are used. The version of the DSDT sets | the global integer width for all integers, including integers in SSDTs. 4) use the lowest ACPI spec version to document AML terms. 5) use "nvdimm" as nvdimm device name instead of "pc-nvdimm" - changes from Stefan's comments: 1) do not do endian adjustment in-place since _DSM memory is visible to guest 2) use target platform's target page size instead of fixed PAGE_SIZE definition 3) lots of code style improvement and typo fixes. 4) live migration fix - changes from Paolo's comments: 1) improve the name of memory region - other changes: 1) return exact buffer size for _DSM method instead of the page size. 2) introduce mutex in NVDIMM ACPI as the _DSM memory is shared by all nvdimm devices. 3) NUMA support 4) implement _FIT method 5) rename "configdata" to "reserve-label-data" 6) simplify _DSM arg3 determination 7) main changelog update to let it reflect v3. Changlog in v2: - Use litten endian for DSM method, thanks for Stefan's suggestion - introduce a new parameter, @configdata, if it's false, Qemu will build a static and readonly namespace in memory and use it serveing for DSM GET_CONFIG_SIZE/GET_CONFIG_DATA requests. In this case, no reserved region is needed at the end of the @file, it is good for the user who want to pass whole nvdimm device and make its data completely be visible to guest - divide the source code into separated files and add maintain info BTW, PCOMMIT virtualization on KVM side is work in progress, hopefully will be posted on next week == Background == NVDIMM (A Non-Volatile Dual In-line Memory Module) is going to be supported on Intel's platform. They are discovered via ACPI and configured by _DSM method of NVDIMM device in ACPI. There has some supporting documents which can be found at: ACPI 6: http://www.uefi.org/sites/default/files/resources/ACPI_6.0.pdf NVDIMM Namespace: http://pmem.io/documents/NVDIMM_Namespace_Spec.pdf DSM Interface Example: http://pmem.io/documents/NVDIMM_DSM_Interface_Example.pdf Driver Writer's Guide: http://pmem.io/documents/NVDIMM_Driver_Writers_Guide.pdf Currently, the NVDIMM driver has been merged into upstream Linux Kernel and this patchset tries to enable it in virtualization field == Design == NVDIMM supports two mode accesses, one is PMEM which maps NVDIMM into CPU's address space then CPU can directly access it as normal memory, another is BLK which is used as block device to reduce the occupying of CPU address space BLK mode accesses NVDIMM via Command Register window and Data Register window. BLK virtualization has high workload since each sector access will cause at least two VM-EXIT. So we currently only imperilment vPMEM in this patchset --- vPMEM design --- We introduce a new device named "nvdimm", it uses memory backend device as NVDIMM memory. The file in file-backend device can be a regular file and block device. We can use any file when we do test or emulation, however, in the real word, the files passed to guest are: - the regular file in the filesystem with DAX enabled created on NVDIMM device on host - the raw PMEM device on host, e,g /dev/pmem0 Memory access on the address created by mmap on these kinds of files can directly reach NVDIMM device on host. --- vConfigure data area design --- Each NVDIMM device has a configure data area which is used to store label namespace data. In order to emulating this area, we divide the file into two parts: - first parts is (0, size - 128K], which is used as PMEM - 128K at the end of the file, which is used as Label Data Area So that the label namespace data can be
Re: [PATCH v3 00/32] implement vNVDIMM
On Sat, Oct 10, 2015 at 8:52 PM, Xiao Guangrongwrote: [..] > == Test == > In host > 1) create memory backed file, e.g # dd if=zero of=/tmp/nvdimm bs=1G count=10 > 2) append "-object memory-backend-file,share,id=mem1, >mem-path=/tmp/nvdimm -device nvdimm,memdev=mem1,reserve-label-data, >id=nv1" in QEMU command line > > In guest, download the latest upsteam kernel (4.2 merge window) and enable > ACPI_NFIT, LIBNVDIMM and BLK_DEV_PMEM. > 1) insmod drivers/nvdimm/libnvdimm.ko > 2) insmod drivers/acpi/nfit.ko > 3) insmod drivers/nvdimm/nd_btt.ko > 4) insmod drivers/nvdimm/nd_pmem.ko > You can see the whole nvdimm device used as a single namespace and /dev/pmem0 > appears. You can do whatever on /dev/pmem0 including DAX access. > > Currently Linux NVDIMM driver does not support namespace operation on this > kind of PMEM, apply below changes to support dynamical namespace: > > @@ -798,7 +823,8 @@ static int acpi_nfit_register_dimms(struct acpi_nfit_desc > *a > continue; > } > > - if (nfit_mem->bdw && nfit_mem->memdev_pmem) > + //if (nfit_mem->bdw && nfit_mem->memdev_pmem) > + if (nfit_mem->memdev_pmem) > flags |= NDD_ALIASING; This is just for testing purposes, right? I expect guests can sub-divide persistent memory capacity by partitioning the resulting block device(s). -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html