RE: Question on Error Injection

Elliott, Robert (Persistent Memory) Thu, 03 Jan 2019 17:56:40 -0800

> -----Original Message-----
> From: Verma, Vishal L <vishal.l.ve...@intel.com>
> Sent: Thursday, January 3, 2019 6:03 PM
> To: kamalkakri2...@yahoo.com; linux-nvdimm@lists.01.org; Elliott, Robert
> (Persistent Memory) <elli...@hpe.com>
> Subject: Re: Question on Error Injection
> 
> 
> On Thu, 2019-01-03 at 22:30 +0000, Elliott, Robert (Persistent Memory)
> wrote:
> > > -----Original Message-----
> > > From: Linux-nvdimm <linux-nvdimm-boun...@lists.01.org> On Behalf Of
> Verma,
> > > Vishal L
> > > Sent: Thursday, January 3, 2019 3:27 PM
> > > To: kamalkakri2...@yahoo.com; linux-nvdimm@lists.01.org
> > > Subject: Re: Question on Error Injection
> > >
> > >
> > > On Thu, 2019-01-03 at 20:02 +0000, Kamal Kakri wrote:
> > > > My device has errors injected:
> > > > # ndctl inject-error --status namespace2.0
> > > > {
> > > >   "badblocks":[
> > > >     {
> > > >       "block":35000,
> > > >       "count":10
> > > >     }
> > > >   ]
> > > > }
> > > >
> > > > No problem reading from the bad offsets:
> > > > # dd if=/dev/pmem2 of=/tmp/pmem_out bs=512 count=10 skip=35000
> > > > 10+0 records in
> > > > 10+0 records out
> > > > 5120 bytes (5.1 kB) copied, 0.000108226 s, 47.3 MB/s
> > >
> > > Did you ever read from /dev/pmem2 before injecting the error? There is
> > > a possibility that the page is already present in the page cache and
> > > the read gets serviced from there. You can set iflag=direct to ensure
> > > you're reading from the device.
> > >
> > > Other than that, there /should/ have been an MCE/sigbus in this case.
> > > I'd check with your hardware/platform vendor to ensure machine checks
> > > are available, and to ensure that injecting error does result in a
> > > memory error/poison consumption by the CPU.
> >
> > An application like dd making traditional read() calls should see
> > them fail and report it this:
> >   dd: error reading '/dev/pmem2': Input/output error
> >
> > The application itself shouldn't be terminated with SIGBUS - that's
> > for an application doing memory accesses that cannot be resolved.
> 
> Correct, an IO error will occur when the error is 'known', i.e. in
> badblocks. But in the case of injection with --no-notify, the error is
> a latent one, and the kernel can't prevent access to the poison
> location, and in this case the application will receive a SIGBUS when
> the page is faulted in.

memcpy_mcsafe() used by the kernel pmem driver is designed to tolerate
those problems. There will be a machine check so the kernel can record
the bad block, but it won't be seen by the application except as read()
failing.




_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm
RE: Question on Error Injection

Reply via email to