On Thu, Aug 24, 2017 at 1:32 PM, Verma, Vishal L
<vishal.l.ve...@intel.com> wrote:
> On Wed, 2017-08-23 at 17:23 +0000, Kani, Toshimitsu wrote:
>> On Tue, 2017-08-22 at 16:19 -0600, Vishal Verma wrote:
>>  :
>> > +
>> > +/* The block had a media error, and needs to be
>> > cleared */
>> > +if (btt_is_badblock(btt, arena, arena-
>> > > freelist[lane].block)) {
>> >
>> > +arena->freelist[lane].has_err = 1;
>> > +nd_region_release_lane(btt->nd_region,
>> > lane);
>> > +
>> > +arena_clear_freelist_error(arena, lane);
>> > +/* OK to acquire a different lane/free block
>> > */
>> > +goto retry;
>>
>> I hit an infinite clear loop when DSM Clear Uncorrectable Error
>> function fails.  Haven't looked into the details, but I suspect this
>> unconditional retry is the cause of this.
>
> Thanks Toshi - that makes sense. I think the right thing to do would be
> if the DSM fails, return an EIO yes? (Or should we ignore the fact that
> there was an error, clear ->has_err, and let the write take its course
> (possibly generate a CMCI)
>
> It will still be in the badblock list, and for reads ->rw_bytes will
> still check and fail them.
>
> I'll send out a new series with a fix, but we really need to get a unit
> test for BTT error clearing, and I'm working on implementing the new
> error injection DSMs in libndctl and nfit_test to do that.
>

I think as much as possible we should try to not fail writes. Leave
the badblock entry in place so that we get an error on the next read.
Upper-level software reacts more aggressively to write errors than
read errors.
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

Reply via email to