Re: tango_nand: is logic right in error cases? (was Re: fsl_ifc_nand: are blank pages protected by ECC?)
On Sun, 23 Apr 2017 11:58:45 +0200 Pavel Machek wrote: > Hi! > > > > Maybe I figured it out. Unfortunately, it is only compile tested. Does > > > it look approximately right? > > > > Yep that's definitely better. Just one thing missing (see below), > > otherwise it looks good. > > I'm copying from tango_nand, therefore I had to check tango_nand, too. > > static int check_erased_page(struct nand_chip *chip, u8 *buf) > { > ... > res = nand_check_erased_ecc_chunk(buf, pkt_size, ecc, > ecc_size, > meta, meta_len, > chip->ecc.strength); > if (res < 0) > mtd->ecc_stats.failed++; > else > mtd->ecc_stats.corrected += res; > > bitflips = max(res, bitflips); > ... > return bitflips; > } > > static int tango_read_page(struct mtd_info *mtd, struct nand_chip *chip, >u8 *buf, int oob_required, int page) > { > ... > res = decode_error_report(nfc); > if (res < 0) { > chip->ecc.read_oob_raw(mtd, chip, page); > res = check_erased_page(chip, buf); > } > > return res; > } > > > So nand_check_erased_ecc_chunk() returns < 0 (failed ECC), but then we > perform max() with bitflips (lets say 1, correctable ECC) and return > 1? tango_read_page then returns 1 (correctable ECC) forgetting about > failed ECC...? Yep, that's expected, see what's done in the core to detect uncorrectable errors and return EBADMSG when appropriate [1]. [1]http://lxr.free-electrons.com/source/drivers/mtd/nand/nand_base.c#L2033
tango_nand: is logic right in error cases? (was Re: fsl_ifc_nand: are blank pages protected by ECC?)
Hi! > > Maybe I figured it out. Unfortunately, it is only compile tested. Does > > it look approximately right? > > Yep that's definitely better. Just one thing missing (see below), > otherwise it looks good. I'm copying from tango_nand, therefore I had to check tango_nand, too. static int check_erased_page(struct nand_chip *chip, u8 *buf) { ... res = nand_check_erased_ecc_chunk(buf, pkt_size, ecc, ecc_size, meta, meta_len, chip->ecc.strength); if (res < 0) mtd->ecc_stats.failed++; else mtd->ecc_stats.corrected += res; bitflips = max(res, bitflips); ... return bitflips; } static int tango_read_page(struct mtd_info *mtd, struct nand_chip *chip, u8 *buf, int oob_required, int page) { ... res = decode_error_report(nfc); if (res < 0) { chip->ecc.read_oob_raw(mtd, chip, page); res = check_erased_page(chip, buf); } return res; } So nand_check_erased_ecc_chunk() returns < 0 (failed ECC), but then we perform max() with bitflips (lets say 1, correctable ECC) and return 1? tango_read_page then returns 1 (correctable ECC) forgetting about failed ECC...? Best regards, Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html signature.asc Description: Digital signature
Re: fsl_ifc_nand: are blank pages protected by ECC?
Hi! > > Maybe I figured it out. Unfortunately, it is only compile tested. Does > > it look approximately right? > > Yep that's definitely better. Just one thing missing (see below), > otherwise it looks good. Thanks for review. Unfortunately it is still untested, so... > > + if (res < 0) > > + mtd->ecc_stats.failed++; > > else > mtd->ecc_stats.corrected += res; Ok, I copied this from tango_nand. I'll submit a patch to fix it there... > > + > > + bitflips = max(res, bitflips); > > + buf += pkt_size; > > + ecc += ecc_size; > > + } > > + > > + mtd_ooblayout_ecc(mtd, 1, &oobregion); > > + BUG_ON(oobregion.length); > > Probably something you should check at registration time only. Drive defensively, buy a tank ;-). Ok, I'll delete this one in final version. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html signature.asc Description: Digital signature
Re: fsl_ifc_nand: are blank pages protected by ECC?
On Fri, 21 Apr 2017 15:37:21 +0200 Pavel Machek wrote: > Hi! > > > (Added driver author to the cc list, maybe he can help). > > > > > UBIFS complains: > > > > > > > > UBIFS error (pid 931): ubifs_scan: corrupt empty space at LEB 282:252630 > > > > UBIFS error (pid 931): ubifs_scanned_corruption: corruption at LEB > > > > 282:252630 > > > > UBIFS error (pid 931): ubifs_scanned_corruption: first 1322 bytes from > > > > LEB 282:252630 > > > > UBIFS error (pid 931): ubifs_scan: LEB 282 scanning failed > ... > > > > is_blank() checks for all 0xff's, so single-bit 0xfe in the data will > > > > result in_blank() == 0 and uncorrectable error being signaled. > > > > > > > > Should the driver be modified somehow? > > > > > > Yep, nand_check_erased_ecc_chunk() [1] is here to help you check this > > > case, unfortunately, it's not directly applicable here, because this > > Maybe I figured it out. Unfortunately, it is only compile tested. Does > it look approximately right? Yep that's definitely better. Just one thing missing (see below), otherwise it looks good. > > Thanks, > Pavel > > --- a/drivers/mtd/nand/fsl_ifc_nand.c > +++ b/drivers/mtd/nand/fsl_ifc_nand.c > @@ -171,32 +171,6 @@ static void set_addr(struct mtd_info *mtd, int column, > int page_addr, int oob) > ifc_nand_ctrl->index += mtd->writesize; > } > > -static int is_blank(struct mtd_info *mtd, unsigned int bufnum) > -{ > - struct nand_chip *chip = mtd_to_nand(mtd); > - struct fsl_ifc_mtd *priv = nand_get_controller_data(chip); > - u8 __iomem *addr = priv->vbase + bufnum * (mtd->writesize * 2); > - u32 __iomem *mainarea = (u32 __iomem *)addr; > - u8 __iomem *oob = addr + mtd->writesize; > - struct mtd_oob_region oobregion = { }; > - int i, section = 0; > - > - i = nand_check_erased_buf(&mainarea[i], mtd->writesize, 0); > - if (i) > - return 0; > - > - mtd_ooblayout_ecc(mtd, section++, &oobregion); > - while (oobregion.length) { > - i = nand_check_erased_buf(&oob[oobregion.offset], > oobregion.length, 0); > - if (i) > - return 0; > - > - mtd_ooblayout_ecc(mtd, section++, &oobregion); > - } > - > - return 1; > -} > - > /* returns nonzero if entire page is blank */ > static int check_read_ecc(struct mtd_info *mtd, struct fsl_ifc_ctrl *ctrl, > u32 *eccstat, unsigned int bufnum) > @@ -272,16 +246,14 @@ static void fsl_ifc_run_command(struct mtd_info *mtd) > if (errors == 15) { > /* >* Uncorrectable error. > - * OK only if the whole page is blank. > + * We'll check for blank pages later. >* >* We disable ECCER reporting due to... >* erratum IFC-A002770 -- so report it now if we >* see an uncorrectable error in ECCSTAT. >*/ > - if (!is_blank(mtd, bufnum)) > - ctrl->nand_stat |= > - IFC_NAND_EVTER_STAT_ECCER; > - break; > + ctrl->nand_stat |= IFC_NAND_EVTER_STAT_ECCER; > + printk("NAND flash read: error but > continuing.\n"); > } > > mtd->ecc_stats.corrected += errors; > @@ -676,6 +648,40 @@ static int fsl_ifc_wait(struct mtd_info *mtd, struct > nand_chip *chip) > return nand_fsr | NAND_STATUS_WP; > } > > +/* > + * The controller does not check for bitflips in erased pages, > + * therefore software must check instead. > + */ > +static int check_erased_page(struct nand_chip *chip, u8 *buf) > +{ > + struct mtd_info *mtd = nand_to_mtd(chip); > + u8 *ecc = chip->oob_poi; > + const int ecc_size = chip->ecc.bytes; > + const int pkt_size = chip->ecc.size; > + int i, res, bitflips = 0; > + struct mtd_oob_region oobregion = { }; > + > + mtd_ooblayout_ecc(mtd, 0, &oobregion); > + ecc += oobregion.offset; > + > + for (i = 0; i < chip->ecc.steps; ++i) { > + res = nand_check_erased_ecc_chunk(buf, pkt_size, ecc, ecc_size, > + NULL, 0, > + chip->ecc.strength); > + if (res < 0) > + mtd->ecc_stats.failed++; else mtd->ecc_stats.corrected += res; > + > + bitflips = max(res, bitflips); > + buf += pkt_size; > + ecc += ecc_size; > + } > + > + mtd_ooblayout_ecc(mtd, 1, &oobregion); > + BUG_ON(oobregion.length); Probably something you shoul
Re: fsl_ifc_nand: are blank pages protected by ECC?
Hi! > (Added driver author to the cc list, maybe he can help). > > > UBIFS complains: > > > > > > UBIFS error (pid 931): ubifs_scan: corrupt empty space at LEB 282:252630 > > > UBIFS error (pid 931): ubifs_scanned_corruption: corruption at LEB > > > 282:252630 > > > UBIFS error (pid 931): ubifs_scanned_corruption: first 1322 bytes from > > > LEB 282:252630 > > > UBIFS error (pid 931): ubifs_scan: LEB 282 scanning failed ... > > > is_blank() checks for all 0xff's, so single-bit 0xfe in the data will > > > result in_blank() == 0 and uncorrectable error being signaled. > > > > > > Should the driver be modified somehow? > > > > Yep, nand_check_erased_ecc_chunk() [1] is here to help you check this > > case, unfortunately, it's not directly applicable here, because this Maybe I figured it out. Unfortunately, it is only compile tested. Does it look approximately right? Thanks, Pavel --- a/drivers/mtd/nand/fsl_ifc_nand.c +++ b/drivers/mtd/nand/fsl_ifc_nand.c @@ -171,32 +171,6 @@ static void set_addr(struct mtd_info *mtd, int column, int page_addr, int oob) ifc_nand_ctrl->index += mtd->writesize; } -static int is_blank(struct mtd_info *mtd, unsigned int bufnum) -{ - struct nand_chip *chip = mtd_to_nand(mtd); - struct fsl_ifc_mtd *priv = nand_get_controller_data(chip); - u8 __iomem *addr = priv->vbase + bufnum * (mtd->writesize * 2); - u32 __iomem *mainarea = (u32 __iomem *)addr; - u8 __iomem *oob = addr + mtd->writesize; - struct mtd_oob_region oobregion = { }; - int i, section = 0; - - i = nand_check_erased_buf(&mainarea[i], mtd->writesize, 0); - if (i) - return 0; - - mtd_ooblayout_ecc(mtd, section++, &oobregion); - while (oobregion.length) { - i = nand_check_erased_buf(&oob[oobregion.offset], oobregion.length, 0); - if (i) - return 0; - - mtd_ooblayout_ecc(mtd, section++, &oobregion); - } - - return 1; -} - /* returns nonzero if entire page is blank */ static int check_read_ecc(struct mtd_info *mtd, struct fsl_ifc_ctrl *ctrl, u32 *eccstat, unsigned int bufnum) @@ -272,16 +246,14 @@ static void fsl_ifc_run_command(struct mtd_info *mtd) if (errors == 15) { /* * Uncorrectable error. -* OK only if the whole page is blank. +* We'll check for blank pages later. * * We disable ECCER reporting due to... * erratum IFC-A002770 -- so report it now if we * see an uncorrectable error in ECCSTAT. */ - if (!is_blank(mtd, bufnum)) - ctrl->nand_stat |= - IFC_NAND_EVTER_STAT_ECCER; - break; + ctrl->nand_stat |= IFC_NAND_EVTER_STAT_ECCER; + printk("NAND flash read: error but continuing.\n"); } mtd->ecc_stats.corrected += errors; @@ -676,6 +648,40 @@ static int fsl_ifc_wait(struct mtd_info *mtd, struct nand_chip *chip) return nand_fsr | NAND_STATUS_WP; } +/* + * The controller does not check for bitflips in erased pages, + * therefore software must check instead. + */ +static int check_erased_page(struct nand_chip *chip, u8 *buf) +{ + struct mtd_info *mtd = nand_to_mtd(chip); + u8 *ecc = chip->oob_poi; + const int ecc_size = chip->ecc.bytes; + const int pkt_size = chip->ecc.size; + int i, res, bitflips = 0; + struct mtd_oob_region oobregion = { }; + + mtd_ooblayout_ecc(mtd, 0, &oobregion); + ecc += oobregion.offset; + + for (i = 0; i < chip->ecc.steps; ++i) { + res = nand_check_erased_ecc_chunk(buf, pkt_size, ecc, ecc_size, + NULL, 0, + chip->ecc.strength); + if (res < 0) + mtd->ecc_stats.failed++; + + bitflips = max(res, bitflips); + buf += pkt_size; + ecc += ecc_size; + } + + mtd_ooblayout_ecc(mtd, 1, &oobregion); + BUG_ON(oobregion.length); + + return bitflips; +} + static int fsl_ifc_read_page(struct mtd_info *mtd, struct nand_chip *chip, uint8_t *buf, int oob_required, int page) { @@ -687,11 +693,23 @@ static int fsl_ifc_read_page(struct mtd_info *mtd, struct nand_chip *chip, if (oob_required) fsl_ifc_read_buf(mtd, chip->oob_poi, mtd->oobsize); - if
Re: fsl_ifc_nand: are blank pages protected by ECC?
On Fri, 21 Apr 2017 12:08:13 +0200 Pavel Machek wrote: > Hi! > > (Added driver author to the cc list, maybe he can help). > > > > Hi! > > > > > > We have some problems with fsl_ifc_nand ... in the old kernels, but > > > this one does not seem to be fixed in v4.11, either. > > > > > > UBIFS complains: > > > > > > UBIFS error (pid 931): ubifs_scan: corrupt empty space at LEB 282:252630 > > > UBIFS error (pid 931): ubifs_scanned_corruption: corruption at LEB > > > 282:252630 > > > UBIFS error (pid 931): ubifs_scanned_corruption: first 1322 bytes from > > > LEB 282:252630 > > > UBIFS error (pid 931): ubifs_scan: LEB 282 scanning failed > > > > > > Possible explanation is here: > > > > > > https://e2e.ti.com/support/dsp/davinci_digital_media_processors/f/716/t/289605 > > > > > > # I see on the forum that this issue has been raised before - my > > > # understanding is that the omap2 nand driver does not perform ECC > > > # detection/correction on empty pages so when UBIFS checks the empty > > > # space data and doesn't read all 0xFF then it fails and mounts > > > # read-only. I didn't find any good solution - only a workaround to > > > # remove the UBIFS check.. > > > > > > So I checked fsl_ifc_nand.c in v4.11-rc, and yes, it seems to have the > > > same problem: > > > > > > if (errors == 15) { > > > /* > > > * Uncorrectable error. > > > * OK only if the whole page is blank. > > > * > > > * We disable ECCER reporting due to... > > > * erratum IFC-A002770 -- so report it > > > now if we > > > * see an uncorrectable error in ECCSTAT. > > > */ > > > if (!is_blank(mtd, bufnum)) > > > ctrl->nand_stat |= > > > IFC_NAND_EVTER_STAT_ECCER; > > > break; > > > } > > > > > > is_blank() checks for all 0xff's, so single-bit 0xfe in the data will > > > result in_blank() == 0 and uncorrectable error being signaled. > > > > > > Should the driver be modified somehow? > > > > Yep, nand_check_erased_ecc_chunk() [1] is here to help you check this > > case, unfortunately, it's not directly applicable here, because this > > function takes regular pointers and not __iomem ones. You'll either > > have to copy the data in an intermediate buffer before calling > > nand_check_erased_ecc_chunk(), or cast the SRAM region to a void > > pointer (which is usually not a good idea). The last option would be to > > open code nand_check_erased_ecc_chunk(), but I'd really like to avoid > > that (for maintainability concerns). > > Ok, took a look. __iomem is part of a problem, another part is that > nand_check_erased_ecc_chunk() needs to actually write back 0xff's to > undo the corruption, which would probably be bad idea to do in the > iomem, and next one is that blank actually checks arbitrary number of > regions, based on ecc.layout. > > So this could be used to simplify the code (if nand_check_erased_buf > was exported; it is not), but it does not fix the problem as we still > need to undo the corruption. Actually, there was a good reason for not directly exporting this buffer (see Brian's comment here [1]), and I don't think we should start exporting it. This and the fact that passing an iomem pointer sounds like a bad idea makes me think you should modify the driver to put the data in a buffer when you want to check for bitflips in erased pages. > > Hints welcome, especially if you know right place where to put this > checking. Just had a quick look at the driver, and it seems like you could move things around to check for bitflips in erased pages after you've copied the data in the user buffer (in fsl_ifc_read_page()). > > (BTW, switching to ecc.mode = ECC_SOFT will cause compatibility > problems but should make the problem go away, right?) Nope, I don't think switching to ECC_SOFT is the right solution here. Regards, Boris [1]https://patchwork.ozlabs.org/patch/509970/
Re: fsl_ifc_nand: are blank pages protected by ECC?
Pavel, Am 21.04.2017 um 12:08 schrieb Pavel Machek: > (BTW, switching to ecc.mode = ECC_SOFT will cause compatibility > problems but should make the problem go away, right?) Yes and it is slow. So, fixing the driver is the way to go. :-) Thanks, //richard
Re: fsl_ifc_nand: are blank pages protected by ECC?
Hi! (Added driver author to the cc list, maybe he can help). > > Hi! > > > > We have some problems with fsl_ifc_nand ... in the old kernels, but > > this one does not seem to be fixed in v4.11, either. > > > > UBIFS complains: > > > > UBIFS error (pid 931): ubifs_scan: corrupt empty space at LEB 282:252630 > > UBIFS error (pid 931): ubifs_scanned_corruption: corruption at LEB > > 282:252630 > > UBIFS error (pid 931): ubifs_scanned_corruption: first 1322 bytes from LEB > > 282:252630 > > UBIFS error (pid 931): ubifs_scan: LEB 282 scanning failed > > > > Possible explanation is here: > > > > https://e2e.ti.com/support/dsp/davinci_digital_media_processors/f/716/t/289605 > > > > # I see on the forum that this issue has been raised before - my > > # understanding is that the omap2 nand driver does not perform ECC > > # detection/correction on empty pages so when UBIFS checks the empty > > # space data and doesn't read all 0xFF then it fails and mounts > > # read-only. I didn't find any good solution - only a workaround to > > # remove the UBIFS check.. > > > > So I checked fsl_ifc_nand.c in v4.11-rc, and yes, it seems to have the > > same problem: > > > > if (errors == 15) { > > /* > > * Uncorrectable error. > > * OK only if the whole page is blank. > > * > > * We disable ECCER reporting due to... > > * erratum IFC-A002770 -- so report it now > > if we > > * see an uncorrectable error in ECCSTAT. > > */ > > if (!is_blank(mtd, bufnum)) > > ctrl->nand_stat |= > > IFC_NAND_EVTER_STAT_ECCER; > > break; > > } > > > > is_blank() checks for all 0xff's, so single-bit 0xfe in the data will > > result in_blank() == 0 and uncorrectable error being signaled. > > > > Should the driver be modified somehow? > > Yep, nand_check_erased_ecc_chunk() [1] is here to help you check this > case, unfortunately, it's not directly applicable here, because this > function takes regular pointers and not __iomem ones. You'll either > have to copy the data in an intermediate buffer before calling > nand_check_erased_ecc_chunk(), or cast the SRAM region to a void > pointer (which is usually not a good idea). The last option would be to > open code nand_check_erased_ecc_chunk(), but I'd really like to avoid > that (for maintainability concerns). Ok, took a look. __iomem is part of a problem, another part is that nand_check_erased_ecc_chunk() needs to actually write back 0xff's to undo the corruption, which would probably be bad idea to do in the iomem, and next one is that blank actually checks arbitrary number of regions, based on ecc.layout. So this could be used to simplify the code (if nand_check_erased_buf was exported; it is not), but it does not fix the problem as we still need to undo the corruption. Hints welcome, especially if you know right place where to put this checking. (BTW, switching to ecc.mode = ECC_SOFT will cause compatibility problems but should make the problem go away, right?) Thanks, Pavel diff --git a/drivers/mtd/nand/fsl_ifc_nand.c b/drivers/mtd/nand/fsl_ifc_nand.c index d1570f5..df02d4c 100644 --- a/drivers/mtd/nand/fsl_ifc_nand.c +++ b/drivers/mtd/nand/fsl_ifc_nand.c @@ -181,17 +181,15 @@ static int is_blank(struct mtd_info *mtd, unsigned int bufnum) struct mtd_oob_region oobregion = { }; int i, section = 0; - for (i = 0; i < mtd->writesize / 4; i++) { - if (__raw_readl(&mainarea[i]) != 0x) - return 0; - } + i = nand_check_erased_buf(&mainarea[i], mtd->writesize, 0); + if (i) + return 0; mtd_ooblayout_ecc(mtd, section++, &oobregion); while (oobregion.length) { - for (i = 0; i < oobregion.length; i++) { - if (__raw_readb(&oob[oobregion.offset + i]) != 0xff) - return 0; - } + i = nand_check_erased_buf(&oob[oobregion.offset], oobregion.length, 0); + if (i) + return 0; mtd_ooblayout_ecc(mtd, section++, &oobregion); } -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html signature.asc Description: Digital signature
Re: fsl_ifc_nand: are blank pages protected by ECC?
On Thu, 20 Apr 2017 13:40:57 +0200 Pavel Machek wrote: > Hi! > > > > Would it make sense to only do hweight if *bitmap != ~0ULL ? Would it > > > make sense to only check for bitflips > bitflips_threshold each 128 > > > bytes or something like that? > > > > I didn't go as far as you did and simply assumed hweight32/64() were > > already optimized. Feel free to propose extra improvements. > > I'd propose this one (only compile tested, sorry, not sure how to test > this one). If we see ~0UL, there's no need for hweight, and no need to > check number of bitflips. So this should be net win. Looks good to me. Can you send a patch with a real commit message? Thanks, Boris > > Signed-off-by: Pavel Machek > > diff --git a/drivers/mtd/nand/nand_base.c b/drivers/mtd/nand/nand_base.c > index b0524f8..96c27ec 100644 > --- a/drivers/mtd/nand/nand_base.c > +++ b/drivers/mtd/nand/nand_base.c > @@ -1357,7 +1357,10 @@ static int nand_check_erased_buf(void *buf, int len, > int bitflips_threshold) > > for (; len >= sizeof(long); >len -= sizeof(long), bitmap += sizeof(long)) { > - weight = hweight_long(*((unsigned long *)bitmap)); > + unsigned long d = *((unsigned long *)bitmap); > + if (d == ~0UL) > + continue; > + weight = hweight_long(d); > bitflips += BITS_PER_LONG - weight; > if (unlikely(bitflips > bitflips_threshold)) > return -EBADMSG; >
Re: fsl_ifc_nand: are blank pages protected by ECC?
Hi! > > Would it make sense to only do hweight if *bitmap != ~0ULL ? Would it > > make sense to only check for bitflips > bitflips_threshold each 128 > > bytes or something like that? > > I didn't go as far as you did and simply assumed hweight32/64() were > already optimized. Feel free to propose extra improvements. I'd propose this one (only compile tested, sorry, not sure how to test this one). If we see ~0UL, there's no need for hweight, and no need to check number of bitflips. So this should be net win. Signed-off-by: Pavel Machek diff --git a/drivers/mtd/nand/nand_base.c b/drivers/mtd/nand/nand_base.c index b0524f8..96c27ec 100644 --- a/drivers/mtd/nand/nand_base.c +++ b/drivers/mtd/nand/nand_base.c @@ -1357,7 +1357,10 @@ static int nand_check_erased_buf(void *buf, int len, int bitflips_threshold) for (; len >= sizeof(long); len -= sizeof(long), bitmap += sizeof(long)) { - weight = hweight_long(*((unsigned long *)bitmap)); + unsigned long d = *((unsigned long *)bitmap); + if (d == ~0UL) + continue; + weight = hweight_long(d); bitflips += BITS_PER_LONG - weight; if (unlikely(bitflips > bitflips_threshold)) return -EBADMSG; -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html signature.asc Description: Digital signature
Re: fsl_ifc_nand: are blank pages protected by ECC?
On Thu, 20 Apr 2017 00:15:07 +0200 Pavel Machek wrote: > Hi! > > > > We have some problems with fsl_ifc_nand ... in the old kernels, but > > > this one does not seem to be fixed in v4.11, either. > > > > > > UBIFS complains: > > > > > > UBIFS error (pid 931): ubifs_scan: corrupt empty space at LEB 282:252630 > > > UBIFS error (pid 931): ubifs_scanned_corruption: corruption at LEB > > > 282:252630 > > > UBIFS error (pid 931): ubifs_scanned_corruption: first 1322 bytes from > > > LEB 282:252630 > > > UBIFS error (pid 931): ubifs_scan: LEB 282 scanning failed > > > > > > Possible explanation is here: > > > > > > https://e2e.ti.com/support/dsp/davinci_digital_media_processors/f/716/t/289605 > > > > > > # I see on the forum that this issue has been raised before - my > > > # understanding is that the omap2 nand driver does not perform ECC > > > # detection/correction on empty pages so when UBIFS checks the empty > > > # space data and doesn't read all 0xFF then it fails and mounts > > > # read-only. I didn't find any good solution - only a workaround to > > > # remove the UBIFS check.. > > > > > > So I checked fsl_ifc_nand.c in v4.11-rc, and yes, it seems to have the > > > same problem: > > > > > > if (errors == 15) { > > > /* > > > * Uncorrectable error. > > > * OK only if the whole page is blank. > > > * > > > * We disable ECCER reporting due to... > > > * erratum IFC-A002770 -- so report it > > > now if we > > > * see an uncorrectable error in ECCSTAT. > > > */ > > > if (!is_blank(mtd, bufnum)) > > > ctrl->nand_stat |= > > > IFC_NAND_EVTER_STAT_ECCER; > > > break; > > > } > > > > > > is_blank() checks for all 0xff's, so single-bit 0xfe in the data will > > > result in_blank() == 0 and uncorrectable error being signaled. > > > > > > Should the driver be modified somehow? > > > > Yep, nand_check_erased_ecc_chunk() [1] is here to help you check this > > case, unfortunately, it's not directly applicable here, because this > > function takes regular pointers and not __iomem ones. You'll either > > have to copy the data in an intermediate buffer before calling > > nand_check_erased_ecc_chunk(), or cast the SRAM region to a void > > pointer (which is usually not a good idea). The last option would be to > > open code nand_check_erased_ecc_chunk(), but I'd really like to avoid > > that (for maintainability concerns). > > Ok, thanks a lot for the pointer, that should be doable. > > Core of the code is: > > 1357 for (; len >= sizeof(long); > 1358 len -= sizeof(long), bitmap += sizeof(long)) { > 1359 weight = hweight_long(*((unsigned long > *)bitmap)); > 1360 bitflips += BITS_PER_LONG - weight; > 1361 if (unlikely(bitflips > bitflips_threshold)) > 1362 return -EBADMSG; > 1363 } > > Someone clearly optimized this code (took care to do long accesses > etc), but afaict hweight is quite a heavy operation: > > _GLOBAL(__arch_hweight32) > BEGIN_FTR_SECTION > b __sw_hweight32 > nop > nop > nop > nop > nop > nop > FTR_SECTION_ELSE > BEGIN_FTR_SECTION_NESTED(51) > PPC_POPCNTB(R3,R3) > srdir4,r3,16 > add r3,r4,r3 > srdir4,r3,8 > add r3,r4,r3 > clrldi r3,r3,64-8 > blr > FTR_SECTION_ELSE_NESTED(51) > PPC_POPCNTW(R3,R3) > clrldi r3,r3,64-8 > blr > ALT_FTR_SECTION_END_NESTED_IFCLR(CPU_FTR_POPCNTD, 51) > ALT_FTR_SECTION_END_IFCLR(CPU_FTR_POPCNTB) > EXPORT_SYMBOL(__arch_hweight32) > > Would it make sense to only do hweight if *bitmap != ~0ULL ? Would it > make sense to only check for bitflips > bitflips_threshold each 128 > bytes or something like that? I didn't go as far as you did and simply assumed hweight32/64() were already optimized. Feel free to propose extra improvements.
Re: fsl_ifc_nand: are blank pages protected by ECC?
Hi! > > We have some problems with fsl_ifc_nand ... in the old kernels, but > > this one does not seem to be fixed in v4.11, either. > > > > UBIFS complains: > > > > UBIFS error (pid 931): ubifs_scan: corrupt empty space at LEB 282:252630 > > UBIFS error (pid 931): ubifs_scanned_corruption: corruption at LEB > > 282:252630 > > UBIFS error (pid 931): ubifs_scanned_corruption: first 1322 bytes from LEB > > 282:252630 > > UBIFS error (pid 931): ubifs_scan: LEB 282 scanning failed > > > > Possible explanation is here: > > > > https://e2e.ti.com/support/dsp/davinci_digital_media_processors/f/716/t/289605 > > > > # I see on the forum that this issue has been raised before - my > > # understanding is that the omap2 nand driver does not perform ECC > > # detection/correction on empty pages so when UBIFS checks the empty > > # space data and doesn't read all 0xFF then it fails and mounts > > # read-only. I didn't find any good solution - only a workaround to > > # remove the UBIFS check.. > > > > So I checked fsl_ifc_nand.c in v4.11-rc, and yes, it seems to have the > > same problem: > > > > if (errors == 15) { > > /* > > * Uncorrectable error. > > * OK only if the whole page is blank. > > * > > * We disable ECCER reporting due to... > > * erratum IFC-A002770 -- so report it now > > if we > > * see an uncorrectable error in ECCSTAT. > > */ > > if (!is_blank(mtd, bufnum)) > > ctrl->nand_stat |= > > IFC_NAND_EVTER_STAT_ECCER; > > break; > > } > > > > is_blank() checks for all 0xff's, so single-bit 0xfe in the data will > > result in_blank() == 0 and uncorrectable error being signaled. > > > > Should the driver be modified somehow? > > Yep, nand_check_erased_ecc_chunk() [1] is here to help you check this > case, unfortunately, it's not directly applicable here, because this > function takes regular pointers and not __iomem ones. You'll either > have to copy the data in an intermediate buffer before calling > nand_check_erased_ecc_chunk(), or cast the SRAM region to a void > pointer (which is usually not a good idea). The last option would be to > open code nand_check_erased_ecc_chunk(), but I'd really like to avoid > that (for maintainability concerns). Ok, thanks a lot for the pointer, that should be doable. Core of the code is: 1357 for (; len >= sizeof(long); 1358 len -= sizeof(long), bitmap += sizeof(long)) { 1359 weight = hweight_long(*((unsigned long *)bitmap)); 1360 bitflips += BITS_PER_LONG - weight; 1361 if (unlikely(bitflips > bitflips_threshold)) 1362 return -EBADMSG; 1363 } Someone clearly optimized this code (took care to do long accesses etc), but afaict hweight is quite a heavy operation: _GLOBAL(__arch_hweight32) BEGIN_FTR_SECTION b __sw_hweight32 nop nop nop nop nop nop FTR_SECTION_ELSE BEGIN_FTR_SECTION_NESTED(51) PPC_POPCNTB(R3,R3) srdir4,r3,16 add r3,r4,r3 srdir4,r3,8 add r3,r4,r3 clrldi r3,r3,64-8 blr FTR_SECTION_ELSE_NESTED(51) PPC_POPCNTW(R3,R3) clrldi r3,r3,64-8 blr ALT_FTR_SECTION_END_NESTED_IFCLR(CPU_FTR_POPCNTD, 51) ALT_FTR_SECTION_END_IFCLR(CPU_FTR_POPCNTB) EXPORT_SYMBOL(__arch_hweight32) Would it make sense to only do hweight if *bitmap != ~0ULL ? Would it make sense to only check for bitflips > bitflips_threshold each 128 bytes or something like that? Thanks and best regards, Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html signature.asc Description: Digital signature
Re: fsl_ifc_nand: are blank pages protected by ECC?
On Wed, 19 Apr 2017 14:13:32 +0200 Pavel Machek wrote: > Hi! > > We have some problems with fsl_ifc_nand ... in the old kernels, but > this one does not seem to be fixed in v4.11, either. > > UBIFS complains: > > UBIFS error (pid 931): ubifs_scan: corrupt empty space at LEB 282:252630 > UBIFS error (pid 931): ubifs_scanned_corruption: corruption at LEB 282:252630 > UBIFS error (pid 931): ubifs_scanned_corruption: first 1322 bytes from LEB > 282:252630 > UBIFS error (pid 931): ubifs_scan: LEB 282 scanning failed > > Possible explanation is here: > > https://e2e.ti.com/support/dsp/davinci_digital_media_processors/f/716/t/289605 > > # I see on the forum that this issue has been raised before - my > # understanding is that the omap2 nand driver does not perform ECC > # detection/correction on empty pages so when UBIFS checks the empty > # space data and doesn't read all 0xFF then it fails and mounts > # read-only. I didn't find any good solution - only a workaround to > # remove the UBIFS check.. > > So I checked fsl_ifc_nand.c in v4.11-rc, and yes, it seems to have the > same problem: > > if (errors == 15) { > /* > * Uncorrectable error. > * OK only if the whole page is blank. > * > * We disable ECCER reporting due to... > * erratum IFC-A002770 -- so report it now if > we > * see an uncorrectable error in ECCSTAT. > */ > if (!is_blank(mtd, bufnum)) > ctrl->nand_stat |= > IFC_NAND_EVTER_STAT_ECCER; > break; > } > > is_blank() checks for all 0xff's, so single-bit 0xfe in the data will > result in_blank() == 0 and uncorrectable error being signaled. > > Should the driver be modified somehow? Yep, nand_check_erased_ecc_chunk() [1] is here to help you check this case, unfortunately, it's not directly applicable here, because this function takes regular pointers and not __iomem ones. You'll either have to copy the data in an intermediate buffer before calling nand_check_erased_ecc_chunk(), or cast the SRAM region to a void pointer (which is usually not a good idea). The last option would be to open code nand_check_erased_ecc_chunk(), but I'd really like to avoid that (for maintainability concerns). [1]http://lxr.free-electrons.com/source/drivers/mtd/nand/nand_base.c#L1414