Dan Thanks for the insights!
Can I say, the UCE is delivered from h/w to OS in a single way in case of machine check, only PMEM/DAX stuff filter out UC address and managed in its own way by badblocks, if PMEM/DAX doesn't do so, then common RAS workflow will kick in, right? And how about when ARS is involved but no machine check fired for the function of this patchset? >-----Original Message----- >From: Linux-nvdimm [mailto:[email protected]] On Behalf >Of Dan Williams >Sent: Friday, January 25, 2019 2:28 PM >To: Jane Chu <[email protected]> >Cc: Tom Lendacky <[email protected]>; Michal Hocko ><[email protected]>; linux-nvdimm <[email protected]>; Takashi >Iwai <[email protected]>; Dave Hansen <[email protected]>; Huang, >Ying <[email protected]>; Linux Kernel Mailing List ><[email protected]>; Linux MM <[email protected]>; Jérôme >Glisse <[email protected]>; Borislav Petkov <[email protected]>; Yaowei Bai ><[email protected]>; Ross Zwisler <[email protected]>; >Bjorn Helgaas <[email protected]>; Andrew Morton ><[email protected]>; Wu, Fengguang <[email protected]> >Subject: Re: [PATCH 5/5] dax: "Hotplug" persistent memory for use like >normal RAM > >On Thu, Jan 24, 2019 at 10:13 PM Jane Chu <[email protected]> wrote: >> >> Hi, Dave, >> >> While chatting with my colleague Erwin about the patchset, it occurred >> that we're not clear about the error handling part. Specifically, >> >> 1. If an uncorrectable error is detected during a 'load' in the hot >> plugged pmem region, how will the error be handled? will it be >> handled like PMEM or DRAM? > >DRAM. > >> 2. If a poison is set, and is persistent, which entity should clear >> the poison, and badblock(if applicable)? If it's user's responsibility, >> does ndctl support the clearing in this mode? > >With persistent memory advertised via a static logical-to-physical >storage/dax device mapping, once an error develops it destroys a >physical *and* logical part of a device address space. That loss of >logical address space makes error clearing a necessity. However, with >the DRAM / "System RAM" error handling model, the OS can just offline >the page and map a different one to repair the logical address space. >So, no, ndctl will not have explicit enabling to clear volatile >errors, the OS will just dynamically offline problematic pages. >_______________________________________________ >Linux-nvdimm mailing list >[email protected] >https://lists.01.org/mailman/listinfo/linux-nvdimm

