RE: [Linux-nvdimm] another pmem variant

Brooks, Adam J Wed, 25 Mar 2015 11:10:55 -0700

>The other two patches are a heavily rewritten version of the code that
>Intel gave to various storage vendors to discover the type 12 (and earlier
>type 6) nvdimms, which I massaged into a form that is hopefully suitable
>for mainline.


The problem is that the e820 or the UEFI Memory Map Table on their own are 
really bad ways to represent NVDIMMs.  The memory table idea was originally 
developed 6 years ago prior to NVDIMMs existing.  It was used to define 
traditional battery backed memory.  With traditional battery backed memory 
either the whole region was going to be valid or the whole region was going to 
be gone.  There was also no concept of arming.  You simply have x hours of data 
retention based on your battery be y% charged.  Fast forward a couple years, 
and we continued using the memory table method for something called Copy To 
Flash where the CPU would copy memory from the DIMMs to a SSD of some sort.  
Again this was a whole region or none of the region solution and because we 
were typically using SATA SSD there was no need to "arm" anything.  
Additionally the restore operation (and even the save operation if you were 
brave enough) could be done from the OS.  Therefore there was no need for the 
BIOS to pass up any status regarding if the recovery was successful or not.

Fast forward again to the present day and NVDIMMs.  We used the memory table 
model initially for NVDIMM because 1) the BIOS code was already in place 2) we 
had a non-upstreamed driver (something that predated pmem by several years 
called ADRBD).  In a perfect world where there are no hardware failures 
e820+ADRBD work great for NVDIMMs.  However in the real world where there are 
failures it has a number of short comings.  Mainly there are the following 
issues with it:
1) The region may now be comprised for 2+ different NVDIMMs that have different 
statuses. A subset of NVDIMMs may have failed the restore.  An NVDIMM may have 
been added since after the last save/restore of the existing NVDIMM
2) Just based on the e820 table, the OS has no one of knowing where the 
boundaries of the NVDIMMs are.  It has no one of knowing if they are all 
interleaved together where a failure of single NVDIMM means the loss of the 
whole region, or if the NVDIMMs are non-interleaved and can be treated as 
separate memory regions to prevent the failure of one NVDIMM from causing data 
to be lost form all NVDIMM
2) Due to the requirement to restore the MRS/RC registers the NVDIMM restore 
must be done from the BIOS.  Depending on the security settings of the platform 
the OS may not be able to directly interrogate the individual NVDIMMs to find 
their status.  Even if the OS can get to the NVDIMM over SMBUS all information 
about the status of the last restore attempt may have been wiped if the BIOS 
was also configured to do the erase/arm operation

For those reasons (and more) simply using the current memory tables is not a 
good solution. A more detailed NVDIMM specific table is required to surface the 
status and configuration of the NVDIMMs.  Unfortunately that table has been 
perpetually delayed, and a result people are trying to move forward with Type 
12.  I understand why this has been done, and for highly embedded storage 
appliances it is fine, because those users probably inherently know the 
configuration of the NVDIMMs.  However for general purpose systems where the 
user has no way of knowing the exact configuration of the DIMMS, just using the 
e820 or UEFI Memory Map table is not sufficient.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [Linux-nvdimm] another pmem variant

Reply via email to