My device has errors injected:
# ndctl inject-error --status namespace2.0{
  "badblocks":[
    {
      "block":35000,
      "count":10
    }
  ]
}

No problem reading from the bad offsets:
 # dd if=/dev/pmem2 of=/tmp/pmem_out bs=512 count=10 skip=35000
10+0 records in
10+0 records out
5120 bytes (5.1 kB) copied, 0.000108226 s, 47.3 MB/s

Kernel doesn't know of the badblocks yet so this should have resulted in sigbus 
for dd:
# cat /sys/block/pmem2/badblocks#
I dont have mcelog daemon running but there is no error in /var/log/messages 
for pmem device. Is there some setting/config that I am missing ?

-KK


    On Thursday, January 3, 2019, 1:39:44 PM EST, Verma, Vishal L 
<vishal.l.ve...@intel.com> wrote:  
 
 
On Thu, 2019-01-03 at 17:13 +0000, Kamal Kakri wrote:
> I am playing around with ndctl inject-error and have a few questions
> around the behavior of the application when an error occurs.
> After successfully injecting error with --no-notify, I am able to
> read and write to the namespace device with no problems. For e.g.:
> 
> # ndctl inject-error --block=35000 --count=10 --no-notify
> namespace2.0{
>  "dev":"namespace2.0",
>  "mode":"raw",
>  "size":17179869184,
>  "blockdev":"pmem2"
> }
> 
> 
> # dd  if=/dev/pmem2 of=/tmp/pmem-dump bs=512 count=10 seek=35000 oflag=direct

I think you want 'skip=35000' here instead of seek= to read from that
offset in the input.

> 10+0 records in
> 10+0 records out
> 5120 bytes (5.1 kB) copied, 0.0128088 s, 400 kB/s
> 
> # pwd
> /sys/block/pmem2
> # cat badblocks
> #    ----------> empty badblock list

With --no-notify badblocks is expected to be empty, as ACPI will not
notify the OS of new errors.

> 
> 
> [Question] Shouldn't my "dd" get a SIGBUS (default machine-check
> handling) when it encounters badblocks that its not aware of (no-
> notify) ?

Yes it should - I'd be curious to see if you still don't get a machine
check with the seek/skip fix above.

> 
> 
> I tried to do both reading and writing to badblocks and things just
> work. If I scrub my nvdimm's (ndctl start-scrub) and the badblocks
> show up in device badblock list (/sys/block/pmem/badblocks) but dd
> can still work and writing the blocks clears out the badblock list:
> # cat /sys/block/pmem2/badblocks35000 10
> 
> # dd if=/dev/zero of=/dev/pmem2 bs=512 count=10 seek=35000
> oflag=direct
> 10+0 records in
> 10+0 records out

Writing with O_DIRECT is the canonical way to clear errors - what you
might see here is a corrected machine check notification in your kernel
logs (CMCI), but that is just a notification that the platform has
handled the error and no action is required.

    -Vishal

  
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

Reply via email to