Re: [PATCH v6 6/6] libnvdimm, btt: rework error clearing

2017-08-24 Thread Kani, Toshimitsu
On Thu, 2017-08-24 at 17:07 -0400, Jeff Moyer wrote:
> Dan Williams  writes:
> 
> > > > I hit an infinite clear loop when DSM Clear Uncorrectable Error
> > > > function fails.  Haven't looked into the details, but I suspect
> > > > this unconditional retry is the cause of this.
> > > 
> > > Thanks Toshi - that makes sense. I think the right thing to do
> > > would be if the DSM fails, return an EIO yes? (Or should we
> > > ignore the fact that there was an error, clear ->has_err, and let
> > > the write take its course (possibly generate a CMCI)
> > > 
> > > It will still be in the badblock list, and for reads ->rw_bytes
> > > will still check and fail them.
> > > 
> > > I'll send out a new series with a fix, but we really need to get
> > > a unit test for BTT error clearing, and I'm working on
> > > implementing the new error injection DSMs in libndctl and
> > > nfit_test to do that.
> > > 
> > 
> > I think as much as possible we should try to not fail writes. Leave
> > the badblock entry in place so that we get an error on the next
> > read. Upper-level software reacts more aggressively to write errors
> > than read errors.
> 
> I don't think it's wise to lie about data integrity.  If a write
> cannot be completed, it *needs* to fail.  You can't make any
> assumptions about what applications will do with the result.

Agreed.  pmem driver returns with EIO on write in this scenario as
well.

Thanks,
-Toshi
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH v6 6/6] libnvdimm, btt: rework error clearing

2017-08-24 Thread Jeff Moyer
Dan Williams  writes:

>>> I hit an infinite clear loop when DSM Clear Uncorrectable Error
>>> function fails.  Haven't looked into the details, but I suspect this
>>> unconditional retry is the cause of this.
>>
>> Thanks Toshi - that makes sense. I think the right thing to do would be
>> if the DSM fails, return an EIO yes? (Or should we ignore the fact that
>> there was an error, clear ->has_err, and let the write take its course
>> (possibly generate a CMCI)
>>
>> It will still be in the badblock list, and for reads ->rw_bytes will
>> still check and fail them.
>>
>> I'll send out a new series with a fix, but we really need to get a unit
>> test for BTT error clearing, and I'm working on implementing the new
>> error injection DSMs in libndctl and nfit_test to do that.
>>
>
> I think as much as possible we should try to not fail writes. Leave
> the badblock entry in place so that we get an error on the next read.
> Upper-level software reacts more aggressively to write errors than
> read errors.

I don't think it's wise to lie about data integrity.  If a write cannot
be completed, it *needs* to fail.  You can't make any assumptions about
what applications will do with the result.

-Jeff
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH v6 6/6] libnvdimm, btt: rework error clearing

2017-08-24 Thread Dan Williams
On Thu, Aug 24, 2017 at 1:32 PM, Verma, Vishal L
 wrote:
> On Wed, 2017-08-23 at 17:23 +, Kani, Toshimitsu wrote:
>> On Tue, 2017-08-22 at 16:19 -0600, Vishal Verma wrote:
>>  :
>> > +
>> > +/* The block had a media error, and needs to be
>> > cleared */
>> > +if (btt_is_badblock(btt, arena, arena-
>> > > freelist[lane].block)) {
>> >
>> > +arena->freelist[lane].has_err = 1;
>> > +nd_region_release_lane(btt->nd_region,
>> > lane);
>> > +
>> > +arena_clear_freelist_error(arena, lane);
>> > +/* OK to acquire a different lane/free block
>> > */
>> > +goto retry;
>>
>> I hit an infinite clear loop when DSM Clear Uncorrectable Error
>> function fails.  Haven't looked into the details, but I suspect this
>> unconditional retry is the cause of this.
>
> Thanks Toshi - that makes sense. I think the right thing to do would be
> if the DSM fails, return an EIO yes? (Or should we ignore the fact that
> there was an error, clear ->has_err, and let the write take its course
> (possibly generate a CMCI)
>
> It will still be in the badblock list, and for reads ->rw_bytes will
> still check and fail them.
>
> I'll send out a new series with a fix, but we really need to get a unit
> test for BTT error clearing, and I'm working on implementing the new
> error injection DSMs in libndctl and nfit_test to do that.
>

I think as much as possible we should try to not fail writes. Leave
the badblock entry in place so that we get an error on the next read.
Upper-level software reacts more aggressively to write errors than
read errors.
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH v6 6/6] libnvdimm, btt: rework error clearing

2017-08-24 Thread Verma, Vishal L
On Wed, 2017-08-23 at 17:23 +, Kani, Toshimitsu wrote:
> On Tue, 2017-08-22 at 16:19 -0600, Vishal Verma wrote:
>  :
> > +
> > +/* The block had a media error, and needs to be
> > cleared */
> > +if (btt_is_badblock(btt, arena, arena-
> > > freelist[lane].block)) {
> > 
> > +arena->freelist[lane].has_err = 1;
> > +nd_region_release_lane(btt->nd_region,
> > lane);
> > +
> > +arena_clear_freelist_error(arena, lane);
> > +/* OK to acquire a different lane/free block
> > */
> > +goto retry;
> 
> I hit an infinite clear loop when DSM Clear Uncorrectable Error
> function fails.  Haven't looked into the details, but I suspect this
> unconditional retry is the cause of this.

Thanks Toshi - that makes sense. I think the right thing to do would be
if the DSM fails, return an EIO yes? (Or should we ignore the fact that
there was an error, clear ->has_err, and let the write take its course
(possibly generate a CMCI)

It will still be in the badblock list, and for reads ->rw_bytes will
still check and fail them.

I'll send out a new series with a fix, but we really need to get a unit
test for BTT error clearing, and I'm working on implementing the new
error injection DSMs in libndctl and nfit_test to do that.

> 
> Thanks,
> -Toshi
> 
> 
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH v6 6/6] libnvdimm, btt: rework error clearing

2017-08-23 Thread Kani, Toshimitsu
On Tue, 2017-08-22 at 16:19 -0600, Vishal Verma wrote:
 :
> +
> + /* The block had a media error, and needs to be
> cleared */
> + if (btt_is_badblock(btt, arena, arena-
> >freelist[lane].block)) {
> + arena->freelist[lane].has_err = 1;
> + nd_region_release_lane(btt->nd_region,
> lane);
> +
> + arena_clear_freelist_error(arena, lane);
> + /* OK to acquire a different lane/free block
> */
> + goto retry;

I hit an infinite clear loop when DSM Clear Uncorrectable Error
function fails.  Haven't looked into the details, but I suspect this
unconditional retry is the cause of this.

Thanks,
-Toshi


___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


[PATCH v6 6/6] libnvdimm, btt: rework error clearing

2017-08-22 Thread Vishal Verma
Clearing errors or badblocks during a BTT write requires sending an ACPI
DSM, which means potentially sleeping. Since a BTT IO happens in atomic
context (preemption disabled, spinlocks may be held), we cannot perform
error clearing in the course of an IO. Due to this error clearing for
BTT IOs has hitherto been disabled.

In this patch we move error clearing out of the atomic section, and thus
re-enable error clearing with BTTs. When we are about to add a block to
the free list, we check if it was previously marked as an error, and if
it was, we add it to the freelist, but also set a flag that says error
clearing will be required. We then drop the lane (ending the atomic
context), and send a zero buffer so that the error can be cleared. The
error flag in the free list is protected by the nd 'lane', and is set
only be a thread while it holds that lane. When the error is cleared,
the flag is cleared, but while holding a mutex for that freelist index.

When writing, we check for two things -
1/ If the freelist mutex is held or if the error flag is set. If so,
this is an error block that is being (or about to be) cleared.
2/ If the block is a known badblock based on nsio->bb

The second check is required because the BTT map error flag for a map
entry only gets set when an error LBA is read. If we write to a new
location that may not have the map error flag set, but still might be in
the region's badblock list, we can trigger an EIO on the write, which is
undesirable and completely avoidable.

Cc: Jeff Moyer 
Cc: Toshi Kani 
Cc: Dan Williams 
Signed-off-by: Vishal Verma 
---
 drivers/nvdimm/btt.c   | 109 -
 drivers/nvdimm/btt.h   |   5 +++
 drivers/nvdimm/claim.c |   8 
 3 files changed, 103 insertions(+), 19 deletions(-)

diff --git a/drivers/nvdimm/btt.c b/drivers/nvdimm/btt.c
index e9dd651..f5cbd60 100644
--- a/drivers/nvdimm/btt.c
+++ b/drivers/nvdimm/btt.c
@@ -31,6 +31,11 @@ enum log_ent_request {
LOG_OLD_ENT
 };
 
+static u64 adjust_initial_offset(struct nd_btt *nd_btt, u64 offset)
+{
+   return offset + nd_btt->initial_offset;
+}
+
 static int arena_read_bytes(struct arena_info *arena, resource_size_t offset,
void *buf, size_t n, unsigned long flags)
 {
@@ -38,7 +43,7 @@ static int arena_read_bytes(struct arena_info *arena, 
resource_size_t offset,
struct nd_namespace_common *ndns = nd_btt->ndns;
 
/* arena offsets may be shifted from the base of the device */
-   offset += arena->nd_btt->initial_offset;
+   offset = adjust_initial_offset(nd_btt, offset);
return nvdimm_read_bytes(ndns, offset, buf, n, flags);
 }
 
@@ -49,7 +54,7 @@ static int arena_write_bytes(struct arena_info *arena, 
resource_size_t offset,
struct nd_namespace_common *ndns = nd_btt->ndns;
 
/* arena offsets may be shifted from the base of the device */
-   offset += arena->nd_btt->initial_offset;
+   offset = adjust_initial_offset(nd_btt, offset);
return nvdimm_write_bytes(ndns, offset, buf, n, flags);
 }
 
@@ -381,7 +386,9 @@ static int btt_flog_write(struct arena_info *arena, u32 
lane, u32 sub,
arena->freelist[lane].sub = 1 - arena->freelist[lane].sub;
if (++(arena->freelist[lane].seq) == 4)
arena->freelist[lane].seq = 1;
-   arena->freelist[lane].block = le32_to_cpu(ent->old_map);
+   if (ent_e_flag(ent->old_map))
+   arena->freelist[lane].has_err = 1;
+   arena->freelist[lane].block = le32_to_cpu(ent_lba(ent->old_map));
 
return ret;
 }
@@ -480,6 +487,40 @@ static int btt_log_init(struct arena_info *arena)
return ret;
 }
 
+static u64 to_namespace_offset(struct arena_info *arena, u64 lba)
+{
+   return arena->dataoff + ((u64)lba * arena->internal_lbasize);
+}
+
+static int arena_clear_freelist_error(struct arena_info *arena, u32 lane)
+{
+   int ret = 0;
+
+   if (arena->freelist[lane].has_err) {
+   void *zero_page = page_address(ZERO_PAGE(0));
+   u32 lba = arena->freelist[lane].block;
+   u64 nsoff = to_namespace_offset(arena, lba);
+   unsigned long len = arena->sector_size;
+
+   mutex_lock(>err_lock);
+
+   while (len) {
+   unsigned long chunk = min(len, PAGE_SIZE);
+
+   ret = arena_write_bytes(arena, nsoff, zero_page,
+   chunk, 0);
+   if (ret)
+   break;
+   len -= chunk;
+   nsoff += chunk;
+   if (len == 0)
+   arena->freelist[lane].has_err = 0;
+   }
+   mutex_unlock(>err_lock);
+   }
+   return ret;
+}
+
 static int btt_freelist_init(struct arena_info *arena)
 {
int old, new, ret;
@@ -505,6