On Fri, 2019-01-04 at 01:56 +0000, Elliott, Robert (Persistent Memory) wrote:
> > -----Original Message-----
> > From: Verma, Vishal L <vishal.l.ve...@intel.com>
> > Sent: Thursday, January 3, 2019 6:03 PM
> > To: kamalkakri2...@yahoo.com; linux-nvdimm@lists.01.org; Elliott, Robert
> > (Persistent Memory) <elli...@hpe.com>
> > Subject: Re: Question on Error Injection
> > 
> > 
> > On Thu, 2019-01-03 at 22:30 +0000, Elliott, Robert (Persistent Memory)
> > wrote:
> > > > -----Original Message-----
> > > > From: Linux-nvdimm <linux-nvdimm-boun...@lists.01.org> On Behalf Of
> > 
> > Verma,
> > > > Vishal L
> > > > Sent: Thursday, January 3, 2019 3:27 PM
> > > > To: kamalkakri2...@yahoo.com; linux-nvdimm@lists.01.org
> > > > Subject: Re: Question on Error Injection
> > > > 
> > > > 
> > > > On Thu, 2019-01-03 at 20:02 +0000, Kamal Kakri wrote:
> > > > > My device has errors injected:
> > > > > # ndctl inject-error --status namespace2.0
> > > > > {
> > > > >   "badblocks":[
> > > > >     {
> > > > >       "block":35000,
> > > > >       "count":10
> > > > >     }
> > > > >   ]
> > > > > }
> > > > > 
> > > > > No problem reading from the bad offsets:
> > > > > # dd if=/dev/pmem2 of=/tmp/pmem_out bs=512 count=10 skip=35000
> > > > > 10+0 records in
> > > > > 10+0 records out
> > > > > 5120 bytes (5.1 kB) copied, 0.000108226 s, 47.3 MB/s
> > > > 
> > > > Did you ever read from /dev/pmem2 before injecting the error? There is
> > > > a possibility that the page is already present in the page cache and
> > > > the read gets serviced from there. You can set iflag=direct to ensure
> > > > you're reading from the device.
> > > > 
> > > > Other than that, there /should/ have been an MCE/sigbus in this case.
> > > > I'd check with your hardware/platform vendor to ensure machine checks
> > > > are available, and to ensure that injecting error does result in a
> > > > memory error/poison consumption by the CPU.
> > > 
> > > An application like dd making traditional read() calls should see
> > > them fail and report it this:
> > >   dd: error reading '/dev/pmem2': Input/output error
> > > 
> > > The application itself shouldn't be terminated with SIGBUS - that's
> > > for an application doing memory accesses that cannot be resolved.
> > 
> > Correct, an IO error will occur when the error is 'known', i.e. in
> > badblocks. But in the case of injection with --no-notify, the error is
> > a latent one, and the kernel can't prevent access to the poison
> > location, and in this case the application will receive a SIGBUS when
> > the page is faulted in.
> 
> memcpy_mcsafe() used by the kernel pmem driver is designed to tolerate
> those problems. There will be a machine check so the kernel can record
> the bad block, but it won't be seen by the application except as read()
> failing.
> 
Ah yes I keep forgetting about that :)
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

Reply via email to