On Thu, Nov 10, 2016 at 1:56 PM, Vishal Verma <vishal.l.ve...@intel.com> wrote:
> On 11/10, Dan Williams wrote:
>> On Thu, Nov 10, 2016 at 11:17 AM, Vishal Verma <vishal.l.ve...@intel.com> 
>> wrote:
>> > On 11/10, Dave Jiang wrote:
>> >> We need to clear any poison when we are writing to pmem. The granularity
>> >> will be sector size. If it's less then we can't do anything about it
>> >> barring corruption.
>> >>
>> >> Signed-off-by: Dave Jiang <dave.ji...@intel.com>
>> >> ---
>> >>  drivers/nvdimm/claim.c |   24 +++++++++++++++++++++---
>> >>  1 file changed, 21 insertions(+), 3 deletions(-)
>> >>
>> >> diff --git a/drivers/nvdimm/claim.c b/drivers/nvdimm/claim.c
>> >> index d5dc80c..306d8fd 100644
>> >> --- a/drivers/nvdimm/claim.c
>> >> +++ b/drivers/nvdimm/claim.c
>> >> @@ -226,6 +226,11 @@ static int nsio_rw_bytes(struct nd_namespace_common 
>> >> *ndns,
>> >>               resource_size_t offset, void *buf, size_t size, int rw)
>> >>  {
>> >>       struct nd_namespace_io *nsio = to_nd_namespace_io(&ndns->dev);
>> >> +     unsigned int sz_align = ALIGN(size + (offset & (512 - 1)), 512);
>> >> +     sector_t sector = offset >> 9;
>> >> +
>> >> +     if (unlikely(!size))
>> >> +             return -EINVAL;
>> >>
>> >>       if (unlikely(offset + size > nsio->size)) {
>> >>               dev_WARN_ONCE(&ndns->dev, 1, "request out of range\n");
>> >> @@ -233,12 +238,25 @@ static int nsio_rw_bytes(struct nd_namespace_common 
>> >> *ndns,
>> >>       }
>> >>
>> >>       if (rw == READ) {
>> >> -             unsigned int sz_align = ALIGN(size + (offset & (512 - 1)), 
>> >> 512);
>> >> -
>> >> -             if (unlikely(is_bad_pmem(&nsio->bb, offset / 512, 
>> >> sz_align)))
>> >> +             if (unlikely(is_bad_pmem(&nsio->bb, sector, sz_align)))
>> >>                       return -EIO;
>> >>               return memcpy_from_pmem(buf, nsio->addr + offset, size);
>> >>       } else {
>> >> +
>> >> +             if (unlikely(is_bad_pmem(&nsio->bb, sector, sz_align))) {
>> >> +                     if (IS_ALIGNED(offset, 512) && IS_ALIGNED(size, 
>> >> 512)) {
>> >> +                             long cleared;
>> >> +
>> >> +                             cleared = nvdimm_clear_poison(&ndns->dev,
>> >> +                                                           offset, size);
>> >
>> > Everything above looks good..
>> >
>> >> +                             if (cleared != size)
>> >> +                                     return -EIO;
>> >
>> > But I think we don't want to return -EIO here. Instead, ...
>> >> +
>> >> +                             badblocks_clear(&nsio->bb, sector,
>> >> +                                             cleared >> 9);
>> >> +                     }
>> >> +             }
>> >> +
>> >>               memcpy_to_pmem(nsio->addr + offset, buf, size);
>> >>               nvdimm_flush(to_nd_region(ndns->dev.parent));
>> >
>> > Do _some_ writes. We have two options:
>> > 1. Try to write the whole thing (size), and return success. If the write
>> > touches any remaining poison, it will not complain, but the poison will
>> > be retained, so subsequent reads will hit it and fail.
>> >
>> > 2. Only write what we were able to clear, i.e. 'cleared' bytes. And
>> > since we didn't complete the write, error out.
>> >
>> > I think 1 makes more sense, but I could be convinced otherwise..
>> >
>> > One wild idea might be:
>> > rw_bytes can write arbitrary lengths, and commonly does up to 4K at a
>> > time (BTT data writes). This gives us a rather large window where we
>> > could be clearing 4K worth of badblocks all together, and then writing
>> > 4K of data. If we crash anywhere in the middle, we're almost guaranteed
>> > silent data corruption.
>> >
>> > Instead, what if, for the known-errors case for writes, we break it down
>> > into smaller chunks:
>> > Go cache line by cache line - if it may have poison (based on
>> > badblocks), clear poison, followed by a cache line worth of data write.
>> > When we cross a sector boundary (512), do a badblocks_clear.
>> >
>> > I think this gives us the best chance against silent corruption in the
>> > absence of an atomic clear-error-and-write command.
>> >
>> > Thoughts?
>>
>> This feels like over-engineering a still not perfect solution to a
>> rare problem.  Outside of atomic-write-and-clear we should just keep
>> the code best effort and simple.
>
> Fair enough :) In that case I think the only change needed is to simply
> remove the cleared != size check, do badblocks_clear for "cleared >> 9"
> (as it is now), and then write "size", and return success. Sound good?
>

Return success even on an incomplete write? That's not the approach we
took with Toshi's recent change:

ommit 3115bb02b5c23d960df5f1bf551ec394a9bb10ec
Author: Toshi Kani <toshi.k...@hpe.com>
Date:   Thu Oct 13 09:54:21 2016 -0600

    pmem: report error on clear poison failure

    ACPI Clear Uncorrectable Error DSM function may fail or may be
    unsupported on a platform.  pmem_clear_poison() returns without clearing
    badblocks in such cases.  This failure is detected at the next read
    (-EIO).

    This behavior can lead to an issue when user keeps writing but does not
    read immediately.  For instance, flight recorder file may be only read
    when it is necessary for troubleshooting.

    Change pmem_do_bvec() and pmem_clear_poison() to return -EIO so that
    filesystem can log an error message on a write error.

    Cc: Vishal Verma <vishal.l.ve...@intel.com>
    Signed-off-by: Toshi Kani <toshi.k...@hpe.com>
    Signed-off-by: Dan Williams <dan.j.willi...@intel.com>
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

Reply via email to