Re: [U-Boot] NAND ECC Error with wrong SMC ording bug
On Fri, 21 Aug 2009 10:47:09 +0530 vimal singh vimal.neww...@gmail.com wrote: Just one question: did you enabled MTD_NAND_ECC_SMC in configs? It is automagically selected when you select the NDFC driver. Cheers, Sean ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [U-Boot] NAND ECC Error with wrong SMC ording bug
On Friday 21 August 2009 07:17:09 vimal singh wrote: diff --git a/drivers/mtd/nand/ndfc.c b/drivers/mtd/nand/ndfc.c index 89bf85a..497e175 100644 --- a/drivers/mtd/nand/ndfc.c +++ b/drivers/mtd/nand/ndfc.c @@ -101,9 +101,8 @@ static int ndfc_calculate_ecc(struct mtd_info *mtd, wmb(); ecc = in_be32(ndfc-ndfcbase + NDFC_ECC); - /* The NDFC uses Smart Media (SMC) bytes order */ - ecc_code[0] = p[2]; - ecc_code[1] = p[1]; + ecc_code[0] = p[1]; + ecc_code[1] = p[2]; ecc_code[2] = p[3]; return 0; Does anybody see a problem with my method of reproducing the bug? This bug is deadly for our customers. I don't want to make the change unless it is absolutely necessary.. Just one question: did you enabled MTD_NAND_ECC_SMC in configs? Yes, MTD_NAND_ECC_SMC is selected via Kconfig for this driver. Cheers, Stefan -- DENX Software Engineering GmbH, MD: Wolfgang Denk Detlev Zundel HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: (+49)-8142-66989-0 Fax: (+49)-8142-66989-80 Email: off...@denx.de ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [U-Boot] NAND ECC Error with wrong SMC ording bug
Hi Vimal, With the current ndfc code, the error correction gets the bits wrong. Switching it back to the original way and the correction is correct. diff --git a/drivers/mtd/nand/ndfc.c b/drivers/mtd/nand/ndfc.c index 89bf85a..497e175 100644 --- a/drivers/mtd/nand/ndfc.c +++ b/drivers/mtd/nand/ndfc.c @@ -101,9 +101,8 @@ static int ndfc_calculate_ecc(struct mtd_info *mtd, wmb(); ecc = in_be32(ndfc-ndfcbase + NDFC_ECC); - /* The NDFC uses Smart Media (SMC) bytes order */ - ecc_code[0] = p[2]; - ecc_code[1] = p[1]; + ecc_code[0] = p[1]; + ecc_code[1] = p[2]; ecc_code[2] = p[3]; return 0; Does anybody see a problem with my method of reproducing the bug? This bug is deadly for our customers. I don't want to make the change unless it is absolutely necessary.. Just one question: did you enabled MTD_NAND_ECC_SMC in configs? Yes, it was set. Best Regards, Victor Gallardo ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [U-Boot] NAND ECC Error with wrong SMC ording bug
Hi Feng, On Friday 21 August 2009 01:42:42 Feng Kan wrote: We had a board with high number of correctable ECC errors. Which crashed the jffs when it was miss correcting the wrong byte location. OK, thanks. Do you want me to submit a patch for this, or do you prefer to do it. Sure, please go ahead and send a patch to fix this in U-Boot as well. Thanks. Cheers, Stefan -- DENX Software Engineering GmbH, MD: Wolfgang Denk Detlev Zundel HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: (+49)-8142-66989-0 Fax: (+49)-8142-66989-80 Email: off...@denx.de ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [U-Boot] NAND ECC Error with wrong SMC ording bug
On Thu, 20 Aug 2009 07:01:21 +0200 Stefan Roese s...@denx.de wrote: On Thursday 20 August 2009 06:38:51 Sean MacLennan wrote: I see other boards using SMC as well, can someone comment on the change I am proposing. Should I change the correction algorithm or the calculate function? If the later is preferred it would mean the change must be pushed in both U-Boot and Linux. Odds are the calculate function is wrong. The correction algo is used by many nand drivers, I *assume* it is correct. The calculate function was set to agree with u-boot (1.3.0). Yes, it seems that you changed the order in the calculation function while reworking the NDFC driver for arch/powerpc. So we should probably change this order back to the original version. And change it in U-Boot as well. BTW: I didn't see any problems with ECC so far with the current code. Feng, how did you spot this problem? Ok, I think I have reproduced the problem programmatically. Basically, I force a one bit error with the following patch: diff --git a/drivers/mtd/nand/nand_base.c b/drivers/mtd/nand/nand_base.c index 8c21b89..91dd5b4 100644 --- a/drivers/mtd/nand/nand_base.c +++ b/drivers/mtd/nand/nand_base.c @@ -1628,11 +1628,22 @@ static void nand_write_page_hwecc(struct mtd_info *mtd, struct nand_chip *chip, uint8_t *ecc_calc = chip-buffers-ecccalc; const uint8_t *p = buf; uint32_t *eccpos = chip-ecc.layout-eccpos; + static int count; for (i = 0; eccsteps; eccsteps--, i += eccbytes, p += eccsize) { chip-ecc.hwctl(mtd, NAND_ECC_WRITE); - chip-write_buf(mtd, p, eccsize); - chip-ecc.calculate(mtd, p, ecc_calc[i]); + if (count == 0) { + count = 1; + printk(Corrupt one bit: %08x = %08x\n, + *p, *p ^ 8); + *(uint8_t *)p ^= 8; + chip-write_buf(mtd, p, eccsize); + *(uint8_t *)p ^= 8; + nand_calculate_ecc(mtd, p, ecc_calc[i]); + } else { + chip-write_buf(mtd, p, eccsize); + chip-ecc.calculate(mtd, p, ecc_calc[i]); + } } for (i = 0; i chip-ecc.total; i++) Basically I write a one bit error to the NAND, but calculate with the correct bit. This assumes nand_calculate_ecc is correct. I then added debugs to the correction to make sure it corrected properly: diff --git a/drivers/mtd/nand/nand_ecc.c b/drivers/mtd/nand/nand_ecc.c index c0cb87d..57dcaa1 100644 --- a/drivers/mtd/nand/nand_ecc.c +++ b/drivers/mtd/nand/nand_ecc.c @@ -483,14 +483,20 @@ int nand_correct_data(struct mtd_info *mtd, unsigned char *buf, byte_addr = (addressbits[b2 0x3] 8) + (addressbits[b1] 4) + addressbits[b0]; bit_addr = addressbits[b2 2]; + + printk(Single bit error: correct %08x = %08x\n, + buf[byte_addr], buf[byte_addr] ^ (1 bit_addr)); + /* flip the bit */ buf[byte_addr] ^= (1 bit_addr); return 1; } /* count nr of bits; use table lookup, faster than calculating it */ - if ((bitsperbyte[b0] + bitsperbyte[b1] + bitsperbyte[b2]) == 1) + if ((bitsperbyte[b0] + bitsperbyte[b1] + bitsperbyte[b2]) == 1) { + printk(ECC DATA BAD\n); // SAM DBG return 1; /* error in ecc data; no action needed */ + } printk(KERN_ERR uncorrectable error : ); return -1; With the current ndfc code, the error correction gets the bits wrong. Switching it back to the original way and the correction is correct. diff --git a/drivers/mtd/nand/ndfc.c b/drivers/mtd/nand/ndfc.c index 89bf85a..497e175 100644 --- a/drivers/mtd/nand/ndfc.c +++ b/drivers/mtd/nand/ndfc.c @@ -101,9 +101,8 @@ static int ndfc_calculate_ecc(struct mtd_info *mtd, wmb(); ecc = in_be32(ndfc-ndfcbase + NDFC_ECC); - /* The NDFC uses Smart Media (SMC) bytes order */ - ecc_code[0] = p[2]; - ecc_code[1] = p[1]; + ecc_code[0] = p[1]; + ecc_code[1] = p[2]; ecc_code[2] = p[3]; return 0; Does anybody see a problem with my method of reproducing the bug? This bug is deadly for our customers. I don't want to make the change unless it is absolutely necessary. Cheers, Sean ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
RE: [U-Boot] NAND ECC Error with wrong SMC ording bug
Hi Sean, The change is necessary in both Linux and u-boot. Without this change customer are seeing the problem. Best Regards, Victor Gallardo -Original Message- From: linuxppc-dev-bounces+vgallardo=amcc@lists.ozlabs.org [mailto:linuxppc-dev- bounces+vgallardo=amcc@lists.ozlabs.org] On Behalf Of Sean MacLennan Sent: Thursday, August 20, 2009 12:37 PM To: Stefan Roese Cc: u-b...@lists.denx.de; Feng Kan; linux-...@lists.infradead.org; linuxppc-...@ozlabs.org Subject: Re: [U-Boot] NAND ECC Error with wrong SMC ording bug On Thu, 20 Aug 2009 07:01:21 +0200 Stefan Roese s...@denx.de wrote: On Thursday 20 August 2009 06:38:51 Sean MacLennan wrote: I see other boards using SMC as well, can someone comment on the change I am proposing. Should I change the correction algorithm or the calculate function? If the later is preferred it would mean the change must be pushed in both U-Boot and Linux. Odds are the calculate function is wrong. The correction algo is used by many nand drivers, I *assume* it is correct. The calculate function was set to agree with u-boot (1.3.0). Yes, it seems that you changed the order in the calculation function while reworking the NDFC driver for arch/powerpc. So we should probably change this order back to the original version. And change it in U-Boot as well. BTW: I didn't see any problems with ECC so far with the current code. Feng, how did you spot this problem? Ok, I think I have reproduced the problem programmatically. Basically, I force a one bit error with the following patch: diff --git a/drivers/mtd/nand/nand_base.c b/drivers/mtd/nand/nand_base.c index 8c21b89..91dd5b4 100644 --- a/drivers/mtd/nand/nand_base.c +++ b/drivers/mtd/nand/nand_base.c @@ -1628,11 +1628,22 @@ static void nand_write_page_hwecc(struct mtd_info *mtd, struct nand_chip *chip, uint8_t *ecc_calc = chip-buffers-ecccalc; const uint8_t *p = buf; uint32_t *eccpos = chip-ecc.layout-eccpos; + static int count; for (i = 0; eccsteps; eccsteps--, i += eccbytes, p += eccsize) { chip-ecc.hwctl(mtd, NAND_ECC_WRITE); - chip-write_buf(mtd, p, eccsize); - chip-ecc.calculate(mtd, p, ecc_calc[i]); + if (count == 0) { + count = 1; + printk(Corrupt one bit: %08x = %08x\n, +*p, *p ^ 8); + *(uint8_t *)p ^= 8; + chip-write_buf(mtd, p, eccsize); + *(uint8_t *)p ^= 8; + nand_calculate_ecc(mtd, p, ecc_calc[i]); + } else { + chip-write_buf(mtd, p, eccsize); + chip-ecc.calculate(mtd, p, ecc_calc[i]); + } } for (i = 0; i chip-ecc.total; i++) Basically I write a one bit error to the NAND, but calculate with the correct bit. This assumes nand_calculate_ecc is correct. I then added debugs to the correction to make sure it corrected properly: diff --git a/drivers/mtd/nand/nand_ecc.c b/drivers/mtd/nand/nand_ecc.c index c0cb87d..57dcaa1 100644 --- a/drivers/mtd/nand/nand_ecc.c +++ b/drivers/mtd/nand/nand_ecc.c @@ -483,14 +483,20 @@ int nand_correct_data(struct mtd_info *mtd, unsigned char *buf, byte_addr = (addressbits[b2 0x3] 8) + (addressbits[b1] 4) + addressbits[b0]; bit_addr = addressbits[b2 2]; + + printk(Single bit error: correct %08x = %08x\n, +buf[byte_addr], buf[byte_addr] ^ (1 bit_addr)); + /* flip the bit */ buf[byte_addr] ^= (1 bit_addr); return 1; } /* count nr of bits; use table lookup, faster than calculating it */ - if ((bitsperbyte[b0] + bitsperbyte[b1] + bitsperbyte[b2]) == 1) + if ((bitsperbyte[b0] + bitsperbyte[b1] + bitsperbyte[b2]) == 1) { + printk(ECC DATA BAD\n); // SAM DBG return 1; /* error in ecc data; no action needed */ + } printk(KERN_ERR uncorrectable error : ); return -1; With the current ndfc code, the error correction gets the bits wrong. Switching it back to the original way and the correction is correct. diff --git a/drivers/mtd/nand/ndfc.c b/drivers/mtd/nand/ndfc.c index 89bf85a..497e175 100644 --- a/drivers/mtd/nand/ndfc.c +++ b/drivers/mtd/nand/ndfc.c @@ -101,9 +101,8 @@ static int ndfc_calculate_ecc(struct mtd_info *mtd, wmb(); ecc = in_be32(ndfc-ndfcbase + NDFC_ECC); - /* The NDFC uses Smart Media (SMC) bytes order */ - ecc_code[0] = p[2]; - ecc_code[1] = p[1]; + ecc_code[0] = p[1]; + ecc_code[1] = p[2]; ecc_code[2] = p[3]; return 0; Does anybody see a problem with my method of reproducing the bug? This bug is deadly for our customers. I don't want to make
Re: [U-Boot] NAND ECC Error with wrong SMC ording bug
Hi Stefan: We had a board with high number of correctable ECC errors. Which crashed the jffs when it was miss correcting the wrong byte location. Do you want me to submit a patch for this, or do you prefer to do it. I am submitting a patch for linux right now. Feng Kan AMCC Software On 08/19/2009 10:01 PM, Stefan Roese wrote: On Thursday 20 August 2009 06:38:51 Sean MacLennan wrote: I see other boards using SMC as well, can someone comment on the change I am proposing. Should I change the correction algorithm or the calculate function? If the later is preferred it would mean the change must be pushed in both U-Boot and Linux. Odds are the calculate function is wrong. The correction algo is used by many nand drivers, I *assume* it is correct. The calculate function was set to agree with u-boot (1.3.0). Yes, it seems that you changed the order in the calculation function while reworking the NDFC driver for arch/powerpc. So we should probably change this order back to the original version. And change it in U-Boot as well. BTW: I didn't see any problems with ECC so far with the current code. Feng, how did you spot this problem? Cheers, Stefan -- DENX Software Engineering GmbH, MD: Wolfgang Denk Detlev Zundel HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: (+49)-8142-66989-0 Fax: (+49)-8142-66989-80 Email: off...@denx.de ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [U-Boot] NAND ECC Error with wrong SMC ording bug
snip With the current ndfc code, the error correction gets the bits wrong. Switching it back to the original way and the correction is correct. diff --git a/drivers/mtd/nand/ndfc.c b/drivers/mtd/nand/ndfc.c index 89bf85a..497e175 100644 --- a/drivers/mtd/nand/ndfc.c +++ b/drivers/mtd/nand/ndfc.c @@ -101,9 +101,8 @@ static int ndfc_calculate_ecc(struct mtd_info *mtd, wmb(); ecc = in_be32(ndfc-ndfcbase + NDFC_ECC); - /* The NDFC uses Smart Media (SMC) bytes order */ - ecc_code[0] = p[2]; - ecc_code[1] = p[1]; + ecc_code[0] = p[1]; + ecc_code[1] = p[2]; ecc_code[2] = p[3]; return 0; Does anybody see a problem with my method of reproducing the bug? This bug is deadly for our customers. I don't want to make the change unless it is absolutely necessary.. Just one question: did you enabled MTD_NAND_ECC_SMC in configs? -vimal Cheers, Sean __ Linux MTD discussion mailing list http://lists.infradead.org/mailman/listinfo/linux-mtd/ ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [U-Boot] NAND ECC Error with wrong SMC ording bug
On Thursday 20 August 2009 06:38:51 Sean MacLennan wrote: I see other boards using SMC as well, can someone comment on the change I am proposing. Should I change the correction algorithm or the calculate function? If the later is preferred it would mean the change must be pushed in both U-Boot and Linux. Odds are the calculate function is wrong. The correction algo is used by many nand drivers, I *assume* it is correct. The calculate function was set to agree with u-boot (1.3.0). Yes, it seems that you changed the order in the calculation function while reworking the NDFC driver for arch/powerpc. So we should probably change this order back to the original version. And change it in U-Boot as well. BTW: I didn't see any problems with ECC so far with the current code. Feng, how did you spot this problem? Cheers, Stefan -- DENX Software Engineering GmbH, MD: Wolfgang Denk Detlev Zundel HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: (+49)-8142-66989-0 Fax: (+49)-8142-66989-80 Email: off...@denx.de ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev