My device has errors injected: # ndctl inject-error --status namespace2.0{ "badblocks":[ { "block":35000, "count":10 } ] }
No problem reading from the bad offsets: # dd if=/dev/pmem2 of=/tmp/pmem_out bs=512 count=10 skip=35000 10+0 records in 10+0 records out 5120 bytes (5.1 kB) copied, 0.000108226 s, 47.3 MB/s Kernel doesn't know of the badblocks yet so this should have resulted in sigbus for dd: # cat /sys/block/pmem2/badblocks# I dont have mcelog daemon running but there is no error in /var/log/messages for pmem device. Is there some setting/config that I am missing ? -KK On Thursday, January 3, 2019, 1:39:44 PM EST, Verma, Vishal L <vishal.l.ve...@intel.com> wrote: On Thu, 2019-01-03 at 17:13 +0000, Kamal Kakri wrote: > I am playing around with ndctl inject-error and have a few questions > around the behavior of the application when an error occurs. > After successfully injecting error with --no-notify, I am able to > read and write to the namespace device with no problems. For e.g.: > > # ndctl inject-error --block=35000 --count=10 --no-notify > namespace2.0{ > "dev":"namespace2.0", > "mode":"raw", > "size":17179869184, > "blockdev":"pmem2" > } > > > # dd if=/dev/pmem2 of=/tmp/pmem-dump bs=512 count=10 seek=35000 oflag=direct I think you want 'skip=35000' here instead of seek= to read from that offset in the input. > 10+0 records in > 10+0 records out > 5120 bytes (5.1 kB) copied, 0.0128088 s, 400 kB/s > > # pwd > /sys/block/pmem2 > # cat badblocks > # ----------> empty badblock list With --no-notify badblocks is expected to be empty, as ACPI will not notify the OS of new errors. > > > [Question] Shouldn't my "dd" get a SIGBUS (default machine-check > handling) when it encounters badblocks that its not aware of (no- > notify) ? Yes it should - I'd be curious to see if you still don't get a machine check with the seek/skip fix above. > > > I tried to do both reading and writing to badblocks and things just > work. If I scrub my nvdimm's (ndctl start-scrub) and the badblocks > show up in device badblock list (/sys/block/pmem/badblocks) but dd > can still work and writing the blocks clears out the badblock list: > # cat /sys/block/pmem2/badblocks35000 10 > > # dd if=/dev/zero of=/dev/pmem2 bs=512 count=10 seek=35000 > oflag=direct > 10+0 records in > 10+0 records out Writing with O_DIRECT is the canonical way to clear errors - what you might see here is a corrected machine check notification in your kernel logs (CMCI), but that is just a notification that the platform has handled the error and no action is required. -Vishal _______________________________________________ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm