Re: [U-Boot] NAND ECC Error with wrong SMC ording bug
On Fri, 21 Aug 2009 10:47:09 +0530 vimal singh vimal.neww...@gmail.com wrote: Just one question: did you enabled MTD_NAND_ECC_SMC in configs? It is automagically selected when you select the NDFC driver. Cheers, Sean ___ U-Boot mailing list U-Boot@lists.denx.de http://lists.denx.de/mailman/listinfo/u-boot
Re: [U-Boot] NAND ECC Error with wrong SMC ording bug
On Friday 21 August 2009 07:17:09 vimal singh wrote: diff --git a/drivers/mtd/nand/ndfc.c b/drivers/mtd/nand/ndfc.c index 89bf85a..497e175 100644 --- a/drivers/mtd/nand/ndfc.c +++ b/drivers/mtd/nand/ndfc.c @@ -101,9 +101,8 @@ static int ndfc_calculate_ecc(struct mtd_info *mtd, wmb(); ecc = in_be32(ndfc-ndfcbase + NDFC_ECC); - /* The NDFC uses Smart Media (SMC) bytes order */ - ecc_code[0] = p[2]; - ecc_code[1] = p[1]; + ecc_code[0] = p[1]; + ecc_code[1] = p[2]; ecc_code[2] = p[3]; return 0; Does anybody see a problem with my method of reproducing the bug? This bug is deadly for our customers. I don't want to make the change unless it is absolutely necessary.. Just one question: did you enabled MTD_NAND_ECC_SMC in configs? Yes, MTD_NAND_ECC_SMC is selected via Kconfig for this driver. Cheers, Stefan -- DENX Software Engineering GmbH, MD: Wolfgang Denk Detlev Zundel HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: (+49)-8142-66989-0 Fax: (+49)-8142-66989-80 Email: off...@denx.de ___ U-Boot mailing list U-Boot@lists.denx.de http://lists.denx.de/mailman/listinfo/u-boot
Re: [U-Boot] NAND ECC Error with wrong SMC ording bug
Hi Vimal, With the current ndfc code, the error correction gets the bits wrong. Switching it back to the original way and the correction is correct. diff --git a/drivers/mtd/nand/ndfc.c b/drivers/mtd/nand/ndfc.c index 89bf85a..497e175 100644 --- a/drivers/mtd/nand/ndfc.c +++ b/drivers/mtd/nand/ndfc.c @@ -101,9 +101,8 @@ static int ndfc_calculate_ecc(struct mtd_info *mtd, wmb(); ecc = in_be32(ndfc-ndfcbase + NDFC_ECC); - /* The NDFC uses Smart Media (SMC) bytes order */ - ecc_code[0] = p[2]; - ecc_code[1] = p[1]; + ecc_code[0] = p[1]; + ecc_code[1] = p[2]; ecc_code[2] = p[3]; return 0; Does anybody see a problem with my method of reproducing the bug? This bug is deadly for our customers. I don't want to make the change unless it is absolutely necessary.. Just one question: did you enabled MTD_NAND_ECC_SMC in configs? Yes, it was set. Best Regards, Victor Gallardo ___ U-Boot mailing list U-Boot@lists.denx.de http://lists.denx.de/mailman/listinfo/u-boot
Re: [U-Boot] NAND ECC Error with wrong SMC ording bug
Hi Feng, On Friday 21 August 2009 01:42:42 Feng Kan wrote: We had a board with high number of correctable ECC errors. Which crashed the jffs when it was miss correcting the wrong byte location. OK, thanks. Do you want me to submit a patch for this, or do you prefer to do it. Sure, please go ahead and send a patch to fix this in U-Boot as well. Thanks. Cheers, Stefan -- DENX Software Engineering GmbH, MD: Wolfgang Denk Detlev Zundel HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: (+49)-8142-66989-0 Fax: (+49)-8142-66989-80 Email: off...@denx.de ___ U-Boot mailing list U-Boot@lists.denx.de http://lists.denx.de/mailman/listinfo/u-boot
Re: [U-Boot] NAND ECC Error with wrong SMC ording bug
On Thu, 20 Aug 2009 07:01:21 +0200 Stefan Roese s...@denx.de wrote: On Thursday 20 August 2009 06:38:51 Sean MacLennan wrote: I see other boards using SMC as well, can someone comment on the change I am proposing. Should I change the correction algorithm or the calculate function? If the later is preferred it would mean the change must be pushed in both U-Boot and Linux. Odds are the calculate function is wrong. The correction algo is used by many nand drivers, I *assume* it is correct. The calculate function was set to agree with u-boot (1.3.0). Yes, it seems that you changed the order in the calculation function while reworking the NDFC driver for arch/powerpc. So we should probably change this order back to the original version. And change it in U-Boot as well. BTW: I didn't see any problems with ECC so far with the current code. Feng, how did you spot this problem? Ok, I think I have reproduced the problem programmatically. Basically, I force a one bit error with the following patch: diff --git a/drivers/mtd/nand/nand_base.c b/drivers/mtd/nand/nand_base.c index 8c21b89..91dd5b4 100644 --- a/drivers/mtd/nand/nand_base.c +++ b/drivers/mtd/nand/nand_base.c @@ -1628,11 +1628,22 @@ static void nand_write_page_hwecc(struct mtd_info *mtd, struct nand_chip *chip, uint8_t *ecc_calc = chip-buffers-ecccalc; const uint8_t *p = buf; uint32_t *eccpos = chip-ecc.layout-eccpos; + static int count; for (i = 0; eccsteps; eccsteps--, i += eccbytes, p += eccsize) { chip-ecc.hwctl(mtd, NAND_ECC_WRITE); - chip-write_buf(mtd, p, eccsize); - chip-ecc.calculate(mtd, p, ecc_calc[i]); + if (count == 0) { + count = 1; + printk(Corrupt one bit: %08x = %08x\n, + *p, *p ^ 8); + *(uint8_t *)p ^= 8; + chip-write_buf(mtd, p, eccsize); + *(uint8_t *)p ^= 8; + nand_calculate_ecc(mtd, p, ecc_calc[i]); + } else { + chip-write_buf(mtd, p, eccsize); + chip-ecc.calculate(mtd, p, ecc_calc[i]); + } } for (i = 0; i chip-ecc.total; i++) Basically I write a one bit error to the NAND, but calculate with the correct bit. This assumes nand_calculate_ecc is correct. I then added debugs to the correction to make sure it corrected properly: diff --git a/drivers/mtd/nand/nand_ecc.c b/drivers/mtd/nand/nand_ecc.c index c0cb87d..57dcaa1 100644 --- a/drivers/mtd/nand/nand_ecc.c +++ b/drivers/mtd/nand/nand_ecc.c @@ -483,14 +483,20 @@ int nand_correct_data(struct mtd_info *mtd, unsigned char *buf, byte_addr = (addressbits[b2 0x3] 8) + (addressbits[b1] 4) + addressbits[b0]; bit_addr = addressbits[b2 2]; + + printk(Single bit error: correct %08x = %08x\n, + buf[byte_addr], buf[byte_addr] ^ (1 bit_addr)); + /* flip the bit */ buf[byte_addr] ^= (1 bit_addr); return 1; } /* count nr of bits; use table lookup, faster than calculating it */ - if ((bitsperbyte[b0] + bitsperbyte[b1] + bitsperbyte[b2]) == 1) + if ((bitsperbyte[b0] + bitsperbyte[b1] + bitsperbyte[b2]) == 1) { + printk(ECC DATA BAD\n); // SAM DBG return 1; /* error in ecc data; no action needed */ + } printk(KERN_ERR uncorrectable error : ); return -1; With the current ndfc code, the error correction gets the bits wrong. Switching it back to the original way and the correction is correct. diff --git a/drivers/mtd/nand/ndfc.c b/drivers/mtd/nand/ndfc.c index 89bf85a..497e175 100644 --- a/drivers/mtd/nand/ndfc.c +++ b/drivers/mtd/nand/ndfc.c @@ -101,9 +101,8 @@ static int ndfc_calculate_ecc(struct mtd_info *mtd, wmb(); ecc = in_be32(ndfc-ndfcbase + NDFC_ECC); - /* The NDFC uses Smart Media (SMC) bytes order */ - ecc_code[0] = p[2]; - ecc_code[1] = p[1]; + ecc_code[0] = p[1]; + ecc_code[1] = p[2]; ecc_code[2] = p[3]; return 0; Does anybody see a problem with my method of reproducing the bug? This bug is deadly for our customers. I don't want to make the change unless it is absolutely necessary. Cheers, Sean ___ U-Boot mailing list U-Boot@lists.denx.de http://lists.denx.de/mailman/listinfo/u-boot
Re: [U-Boot] NAND ECC Error with wrong SMC ording bug
Hi Sean, The change is necessary in both Linux and u-boot. Without this change customer are seeing the problem. Best Regards, Victor Gallardo -Original Message- From: linuxppc-dev-bounces+vgallardo=amcc@lists.ozlabs.org [mailto:linuxppc-dev- bounces+vgallardo=amcc@lists.ozlabs.org] On Behalf Of Sean MacLennan Sent: Thursday, August 20, 2009 12:37 PM To: Stefan Roese Cc: u-boot@lists.denx.de; Feng Kan; linux-...@lists.infradead.org; linuxppc-...@ozlabs.org Subject: Re: [U-Boot] NAND ECC Error with wrong SMC ording bug On Thu, 20 Aug 2009 07:01:21 +0200 Stefan Roese s...@denx.de wrote: On Thursday 20 August 2009 06:38:51 Sean MacLennan wrote: I see other boards using SMC as well, can someone comment on the change I am proposing. Should I change the correction algorithm or the calculate function? If the later is preferred it would mean the change must be pushed in both U-Boot and Linux. Odds are the calculate function is wrong. The correction algo is used by many nand drivers, I *assume* it is correct. The calculate function was set to agree with u-boot (1.3.0). Yes, it seems that you changed the order in the calculation function while reworking the NDFC driver for arch/powerpc. So we should probably change this order back to the original version. And change it in U-Boot as well. BTW: I didn't see any problems with ECC so far with the current code. Feng, how did you spot this problem? Ok, I think I have reproduced the problem programmatically. Basically, I force a one bit error with the following patch: diff --git a/drivers/mtd/nand/nand_base.c b/drivers/mtd/nand/nand_base.c index 8c21b89..91dd5b4 100644 --- a/drivers/mtd/nand/nand_base.c +++ b/drivers/mtd/nand/nand_base.c @@ -1628,11 +1628,22 @@ static void nand_write_page_hwecc(struct mtd_info *mtd, struct nand_chip *chip, uint8_t *ecc_calc = chip-buffers-ecccalc; const uint8_t *p = buf; uint32_t *eccpos = chip-ecc.layout-eccpos; + static int count; for (i = 0; eccsteps; eccsteps--, i += eccbytes, p += eccsize) { chip-ecc.hwctl(mtd, NAND_ECC_WRITE); - chip-write_buf(mtd, p, eccsize); - chip-ecc.calculate(mtd, p, ecc_calc[i]); + if (count == 0) { + count = 1; + printk(Corrupt one bit: %08x = %08x\n, +*p, *p ^ 8); + *(uint8_t *)p ^= 8; + chip-write_buf(mtd, p, eccsize); + *(uint8_t *)p ^= 8; + nand_calculate_ecc(mtd, p, ecc_calc[i]); + } else { + chip-write_buf(mtd, p, eccsize); + chip-ecc.calculate(mtd, p, ecc_calc[i]); + } } for (i = 0; i chip-ecc.total; i++) Basically I write a one bit error to the NAND, but calculate with the correct bit. This assumes nand_calculate_ecc is correct. I then added debugs to the correction to make sure it corrected properly: diff --git a/drivers/mtd/nand/nand_ecc.c b/drivers/mtd/nand/nand_ecc.c index c0cb87d..57dcaa1 100644 --- a/drivers/mtd/nand/nand_ecc.c +++ b/drivers/mtd/nand/nand_ecc.c @@ -483,14 +483,20 @@ int nand_correct_data(struct mtd_info *mtd, unsigned char *buf, byte_addr = (addressbits[b2 0x3] 8) + (addressbits[b1] 4) + addressbits[b0]; bit_addr = addressbits[b2 2]; + + printk(Single bit error: correct %08x = %08x\n, +buf[byte_addr], buf[byte_addr] ^ (1 bit_addr)); + /* flip the bit */ buf[byte_addr] ^= (1 bit_addr); return 1; } /* count nr of bits; use table lookup, faster than calculating it */ - if ((bitsperbyte[b0] + bitsperbyte[b1] + bitsperbyte[b2]) == 1) + if ((bitsperbyte[b0] + bitsperbyte[b1] + bitsperbyte[b2]) == 1) { + printk(ECC DATA BAD\n); // SAM DBG return 1; /* error in ecc data; no action needed */ + } printk(KERN_ERR uncorrectable error : ); return -1; With the current ndfc code, the error correction gets the bits wrong. Switching it back to the original way and the correction is correct. diff --git a/drivers/mtd/nand/ndfc.c b/drivers/mtd/nand/ndfc.c index 89bf85a..497e175 100644 --- a/drivers/mtd/nand/ndfc.c +++ b/drivers/mtd/nand/ndfc.c @@ -101,9 +101,8 @@ static int ndfc_calculate_ecc(struct mtd_info *mtd, wmb(); ecc = in_be32(ndfc-ndfcbase + NDFC_ECC); - /* The NDFC uses Smart Media (SMC) bytes order */ - ecc_code[0] = p[2]; - ecc_code[1] = p[1]; + ecc_code[0] = p[1]; + ecc_code[1] = p[2]; ecc_code[2] = p[3]; return 0; Does anybody see a problem with my method of reproducing the bug? This bug is deadly for our customers. I don't want to make
Re: [U-Boot] NAND ECC Error with wrong SMC ording bug
Hi Stefan: We had a board with high number of correctable ECC errors. Which crashed the jffs when it was miss correcting the wrong byte location. Do you want me to submit a patch for this, or do you prefer to do it. I am submitting a patch for linux right now. Feng Kan AMCC Software On 08/19/2009 10:01 PM, Stefan Roese wrote: On Thursday 20 August 2009 06:38:51 Sean MacLennan wrote: I see other boards using SMC as well, can someone comment on the change I am proposing. Should I change the correction algorithm or the calculate function? If the later is preferred it would mean the change must be pushed in both U-Boot and Linux. Odds are the calculate function is wrong. The correction algo is used by many nand drivers, I *assume* it is correct. The calculate function was set to agree with u-boot (1.3.0). Yes, it seems that you changed the order in the calculation function while reworking the NDFC driver for arch/powerpc. So we should probably change this order back to the original version. And change it in U-Boot as well. BTW: I didn't see any problems with ECC so far with the current code. Feng, how did you spot this problem? Cheers, Stefan -- DENX Software Engineering GmbH, MD: Wolfgang Denk Detlev Zundel HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: (+49)-8142-66989-0 Fax: (+49)-8142-66989-80 Email: off...@denx.de ___ U-Boot mailing list U-Boot@lists.denx.de http://lists.denx.de/mailman/listinfo/u-boot
Re: [U-Boot] NAND ECC Error with wrong SMC ording bug
snip With the current ndfc code, the error correction gets the bits wrong. Switching it back to the original way and the correction is correct. diff --git a/drivers/mtd/nand/ndfc.c b/drivers/mtd/nand/ndfc.c index 89bf85a..497e175 100644 --- a/drivers/mtd/nand/ndfc.c +++ b/drivers/mtd/nand/ndfc.c @@ -101,9 +101,8 @@ static int ndfc_calculate_ecc(struct mtd_info *mtd, wmb(); ecc = in_be32(ndfc-ndfcbase + NDFC_ECC); - /* The NDFC uses Smart Media (SMC) bytes order */ - ecc_code[0] = p[2]; - ecc_code[1] = p[1]; + ecc_code[0] = p[1]; + ecc_code[1] = p[2]; ecc_code[2] = p[3]; return 0; Does anybody see a problem with my method of reproducing the bug? This bug is deadly for our customers. I don't want to make the change unless it is absolutely necessary.. Just one question: did you enabled MTD_NAND_ECC_SMC in configs? -vimal Cheers, Sean __ Linux MTD discussion mailing list http://lists.infradead.org/mailman/listinfo/linux-mtd/ ___ U-Boot mailing list U-Boot@lists.denx.de http://lists.denx.de/mailman/listinfo/u-boot
[U-Boot] NAND ECC Error with wrong SMC ording bug
Hi All: It seems that the ECC correction is broken on the Linux with the 4xx NDFC driver. It uses the SMC order when reading the ECC code. 2-1-3 static int ndfc_calculate_ecc(struct mtd_info *mtd, const u_char *dat, u_char *ecc_code) { struct ndfc_controller *ndfc = ndfc_ctrl; uint32_t ecc; uint8_t *p = (uint8_t *)ecc; wmb(); ecc = in_be32(ndfc-ndfcbase + NDFC_ECC); /* The NDFC uses Smart Media (SMC) bytes order */ ecc_code[0] = p[2]; ecc_code[1] = p[1]; ecc_code[2] = p[3]; return 0; } However, when in the correction function, the byte address order is again reverses causing incorrect byte location. * performace it does not make any difference */ if (eccsize_mult == 1) byte_addr = (addressbits[b0] 4) + addressbits[b1]; The above really should be byte_addr = (addressbits[b1] 4) + addressbits[b0]; else byte_addr = (addressbits[b2 0x3] 8) + (addressbits[b1] 4) + addressbits[b0]; bit_addr = addressbits[b2 2]; /* flip the bit */ buf[byte_addr] ^= (1 bit_addr); printk(KERN_INFO Corrected b[0] 0x%x b[1]0x%x\n, b0, b1); printk(KERN_INFO cal ecc b[0] 0x%x b[1]0x%x\n, calc_ecc[0] , calc_ecc[1]); printk(KERN_INFO read ecc b[0] 0x%x b[1]0x%x\n, read_ecc[0] , read_ecc[1]); return 1; I see other boards using SMC as well, can someone comment on the change I am proposing. Should I change the correction algorithm or the calculate function? If the later is preferred it would mean the change must be pushed in both U-Boot and Linux. Feng Kan AMCC Software ___ U-Boot mailing list U-Boot@lists.denx.de http://lists.denx.de/mailman/listinfo/u-boot
Re: [U-Boot] NAND ECC Error with wrong SMC ording bug
On Wed, 19 Aug 2009 16:16:54 -0700 Feng Kan f...@amcc.com wrote: I see other boards using SMC as well, can someone comment on the change I am proposing. Should I change the correction algorithm or the calculate function? If the later is preferred it would mean the change must be pushed in both U-Boot and Linux. Odds are the calculate function is wrong. The correction algo is used by many nand drivers, I *assume* it is correct. The calculate function was set to agree with u-boot (1.3.0). Cheers, Sean P.S. Yes, I know the u-boot is an ancient version :( ___ U-Boot mailing list U-Boot@lists.denx.de http://lists.denx.de/mailman/listinfo/u-boot
Re: [U-Boot] NAND ECC Error with wrong SMC ording bug
On Thursday 20 August 2009 06:38:51 Sean MacLennan wrote: I see other boards using SMC as well, can someone comment on the change I am proposing. Should I change the correction algorithm or the calculate function? If the later is preferred it would mean the change must be pushed in both U-Boot and Linux. Odds are the calculate function is wrong. The correction algo is used by many nand drivers, I *assume* it is correct. The calculate function was set to agree with u-boot (1.3.0). Yes, it seems that you changed the order in the calculation function while reworking the NDFC driver for arch/powerpc. So we should probably change this order back to the original version. And change it in U-Boot as well. BTW: I didn't see any problems with ECC so far with the current code. Feng, how did you spot this problem? Cheers, Stefan -- DENX Software Engineering GmbH, MD: Wolfgang Denk Detlev Zundel HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: (+49)-8142-66989-0 Fax: (+49)-8142-66989-80 Email: off...@denx.de ___ U-Boot mailing list U-Boot@lists.denx.de http://lists.denx.de/mailman/listinfo/u-boot