Re: [PATCH v6 6/6] libnvdimm, btt: rework error clearing
On Thu, 2017-08-24 at 17:07 -0400, Jeff Moyer wrote: > Dan Williamswrites: > > > > > I hit an infinite clear loop when DSM Clear Uncorrectable Error > > > > function fails. Haven't looked into the details, but I suspect > > > > this unconditional retry is the cause of this. > > > > > > Thanks Toshi - that makes sense. I think the right thing to do > > > would be if the DSM fails, return an EIO yes? (Or should we > > > ignore the fact that there was an error, clear ->has_err, and let > > > the write take its course (possibly generate a CMCI) > > > > > > It will still be in the badblock list, and for reads ->rw_bytes > > > will still check and fail them. > > > > > > I'll send out a new series with a fix, but we really need to get > > > a unit test for BTT error clearing, and I'm working on > > > implementing the new error injection DSMs in libndctl and > > > nfit_test to do that. > > > > > > > I think as much as possible we should try to not fail writes. Leave > > the badblock entry in place so that we get an error on the next > > read. Upper-level software reacts more aggressively to write errors > > than read errors. > > I don't think it's wise to lie about data integrity. If a write > cannot be completed, it *needs* to fail. You can't make any > assumptions about what applications will do with the result. Agreed. pmem driver returns with EIO on write in this scenario as well. Thanks, -Toshi ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
Re: [PATCH v6 6/6] libnvdimm, btt: rework error clearing
Dan Williamswrites: >>> I hit an infinite clear loop when DSM Clear Uncorrectable Error >>> function fails. Haven't looked into the details, but I suspect this >>> unconditional retry is the cause of this. >> >> Thanks Toshi - that makes sense. I think the right thing to do would be >> if the DSM fails, return an EIO yes? (Or should we ignore the fact that >> there was an error, clear ->has_err, and let the write take its course >> (possibly generate a CMCI) >> >> It will still be in the badblock list, and for reads ->rw_bytes will >> still check and fail them. >> >> I'll send out a new series with a fix, but we really need to get a unit >> test for BTT error clearing, and I'm working on implementing the new >> error injection DSMs in libndctl and nfit_test to do that. >> > > I think as much as possible we should try to not fail writes. Leave > the badblock entry in place so that we get an error on the next read. > Upper-level software reacts more aggressively to write errors than > read errors. I don't think it's wise to lie about data integrity. If a write cannot be completed, it *needs* to fail. You can't make any assumptions about what applications will do with the result. -Jeff ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
Re: [PATCH v6 6/6] libnvdimm, btt: rework error clearing
On Thu, Aug 24, 2017 at 1:32 PM, Verma, Vishal Lwrote: > On Wed, 2017-08-23 at 17:23 +, Kani, Toshimitsu wrote: >> On Tue, 2017-08-22 at 16:19 -0600, Vishal Verma wrote: >> : >> > + >> > +/* The block had a media error, and needs to be >> > cleared */ >> > +if (btt_is_badblock(btt, arena, arena- >> > > freelist[lane].block)) { >> > >> > +arena->freelist[lane].has_err = 1; >> > +nd_region_release_lane(btt->nd_region, >> > lane); >> > + >> > +arena_clear_freelist_error(arena, lane); >> > +/* OK to acquire a different lane/free block >> > */ >> > +goto retry; >> >> I hit an infinite clear loop when DSM Clear Uncorrectable Error >> function fails. Haven't looked into the details, but I suspect this >> unconditional retry is the cause of this. > > Thanks Toshi - that makes sense. I think the right thing to do would be > if the DSM fails, return an EIO yes? (Or should we ignore the fact that > there was an error, clear ->has_err, and let the write take its course > (possibly generate a CMCI) > > It will still be in the badblock list, and for reads ->rw_bytes will > still check and fail them. > > I'll send out a new series with a fix, but we really need to get a unit > test for BTT error clearing, and I'm working on implementing the new > error injection DSMs in libndctl and nfit_test to do that. > I think as much as possible we should try to not fail writes. Leave the badblock entry in place so that we get an error on the next read. Upper-level software reacts more aggressively to write errors than read errors. ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
Re: [PATCH v6 6/6] libnvdimm, btt: rework error clearing
On Wed, 2017-08-23 at 17:23 +, Kani, Toshimitsu wrote: > On Tue, 2017-08-22 at 16:19 -0600, Vishal Verma wrote: > : > > + > > +/* The block had a media error, and needs to be > > cleared */ > > +if (btt_is_badblock(btt, arena, arena- > > > freelist[lane].block)) { > > > > +arena->freelist[lane].has_err = 1; > > +nd_region_release_lane(btt->nd_region, > > lane); > > + > > +arena_clear_freelist_error(arena, lane); > > +/* OK to acquire a different lane/free block > > */ > > +goto retry; > > I hit an infinite clear loop when DSM Clear Uncorrectable Error > function fails. Haven't looked into the details, but I suspect this > unconditional retry is the cause of this. Thanks Toshi - that makes sense. I think the right thing to do would be if the DSM fails, return an EIO yes? (Or should we ignore the fact that there was an error, clear ->has_err, and let the write take its course (possibly generate a CMCI) It will still be in the badblock list, and for reads ->rw_bytes will still check and fail them. I'll send out a new series with a fix, but we really need to get a unit test for BTT error clearing, and I'm working on implementing the new error injection DSMs in libndctl and nfit_test to do that. > > Thanks, > -Toshi > > ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
Re: [PATCH v6 6/6] libnvdimm, btt: rework error clearing
On Tue, 2017-08-22 at 16:19 -0600, Vishal Verma wrote: : > + > + /* The block had a media error, and needs to be > cleared */ > + if (btt_is_badblock(btt, arena, arena- > >freelist[lane].block)) { > + arena->freelist[lane].has_err = 1; > + nd_region_release_lane(btt->nd_region, > lane); > + > + arena_clear_freelist_error(arena, lane); > + /* OK to acquire a different lane/free block > */ > + goto retry; I hit an infinite clear loop when DSM Clear Uncorrectable Error function fails. Haven't looked into the details, but I suspect this unconditional retry is the cause of this. Thanks, -Toshi ___ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
[PATCH v6 6/6] libnvdimm, btt: rework error clearing
Clearing errors or badblocks during a BTT write requires sending an ACPI DSM, which means potentially sleeping. Since a BTT IO happens in atomic context (preemption disabled, spinlocks may be held), we cannot perform error clearing in the course of an IO. Due to this error clearing for BTT IOs has hitherto been disabled. In this patch we move error clearing out of the atomic section, and thus re-enable error clearing with BTTs. When we are about to add a block to the free list, we check if it was previously marked as an error, and if it was, we add it to the freelist, but also set a flag that says error clearing will be required. We then drop the lane (ending the atomic context), and send a zero buffer so that the error can be cleared. The error flag in the free list is protected by the nd 'lane', and is set only be a thread while it holds that lane. When the error is cleared, the flag is cleared, but while holding a mutex for that freelist index. When writing, we check for two things - 1/ If the freelist mutex is held or if the error flag is set. If so, this is an error block that is being (or about to be) cleared. 2/ If the block is a known badblock based on nsio->bb The second check is required because the BTT map error flag for a map entry only gets set when an error LBA is read. If we write to a new location that may not have the map error flag set, but still might be in the region's badblock list, we can trigger an EIO on the write, which is undesirable and completely avoidable. Cc: Jeff MoyerCc: Toshi Kani Cc: Dan Williams Signed-off-by: Vishal Verma --- drivers/nvdimm/btt.c | 109 - drivers/nvdimm/btt.h | 5 +++ drivers/nvdimm/claim.c | 8 3 files changed, 103 insertions(+), 19 deletions(-) diff --git a/drivers/nvdimm/btt.c b/drivers/nvdimm/btt.c index e9dd651..f5cbd60 100644 --- a/drivers/nvdimm/btt.c +++ b/drivers/nvdimm/btt.c @@ -31,6 +31,11 @@ enum log_ent_request { LOG_OLD_ENT }; +static u64 adjust_initial_offset(struct nd_btt *nd_btt, u64 offset) +{ + return offset + nd_btt->initial_offset; +} + static int arena_read_bytes(struct arena_info *arena, resource_size_t offset, void *buf, size_t n, unsigned long flags) { @@ -38,7 +43,7 @@ static int arena_read_bytes(struct arena_info *arena, resource_size_t offset, struct nd_namespace_common *ndns = nd_btt->ndns; /* arena offsets may be shifted from the base of the device */ - offset += arena->nd_btt->initial_offset; + offset = adjust_initial_offset(nd_btt, offset); return nvdimm_read_bytes(ndns, offset, buf, n, flags); } @@ -49,7 +54,7 @@ static int arena_write_bytes(struct arena_info *arena, resource_size_t offset, struct nd_namespace_common *ndns = nd_btt->ndns; /* arena offsets may be shifted from the base of the device */ - offset += arena->nd_btt->initial_offset; + offset = adjust_initial_offset(nd_btt, offset); return nvdimm_write_bytes(ndns, offset, buf, n, flags); } @@ -381,7 +386,9 @@ static int btt_flog_write(struct arena_info *arena, u32 lane, u32 sub, arena->freelist[lane].sub = 1 - arena->freelist[lane].sub; if (++(arena->freelist[lane].seq) == 4) arena->freelist[lane].seq = 1; - arena->freelist[lane].block = le32_to_cpu(ent->old_map); + if (ent_e_flag(ent->old_map)) + arena->freelist[lane].has_err = 1; + arena->freelist[lane].block = le32_to_cpu(ent_lba(ent->old_map)); return ret; } @@ -480,6 +487,40 @@ static int btt_log_init(struct arena_info *arena) return ret; } +static u64 to_namespace_offset(struct arena_info *arena, u64 lba) +{ + return arena->dataoff + ((u64)lba * arena->internal_lbasize); +} + +static int arena_clear_freelist_error(struct arena_info *arena, u32 lane) +{ + int ret = 0; + + if (arena->freelist[lane].has_err) { + void *zero_page = page_address(ZERO_PAGE(0)); + u32 lba = arena->freelist[lane].block; + u64 nsoff = to_namespace_offset(arena, lba); + unsigned long len = arena->sector_size; + + mutex_lock(>err_lock); + + while (len) { + unsigned long chunk = min(len, PAGE_SIZE); + + ret = arena_write_bytes(arena, nsoff, zero_page, + chunk, 0); + if (ret) + break; + len -= chunk; + nsoff += chunk; + if (len == 0) + arena->freelist[lane].has_err = 0; + } + mutex_unlock(>err_lock); + } + return ret; +} + static int btt_freelist_init(struct arena_info *arena) { int old, new, ret; @@ -505,6