Re: [U-Boot] NAND ECC Error with wrong SMC ording bug

2009-08-21 Thread Sean MacLennan
On Fri, 21 Aug 2009 10:47:09 +0530
vimal singh vimal.neww...@gmail.com wrote:

 Just one question: did you enabled MTD_NAND_ECC_SMC in configs?

It is automagically selected when you select the NDFC driver.

Cheers,
   Sean
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [U-Boot] NAND ECC Error with wrong SMC ording bug

2009-08-21 Thread Stefan Roese
On Friday 21 August 2009 07:17:09 vimal singh wrote:
  diff --git a/drivers/mtd/nand/ndfc.c b/drivers/mtd/nand/ndfc.c
  index 89bf85a..497e175 100644
  --- a/drivers/mtd/nand/ndfc.c
  +++ b/drivers/mtd/nand/ndfc.c
  @@ -101,9 +101,8 @@ static int ndfc_calculate_ecc(struct mtd_info *mtd,
 
 wmb();
 ecc = in_be32(ndfc-ndfcbase + NDFC_ECC);
  -   /* The NDFC uses Smart Media (SMC) bytes order */
  -   ecc_code[0] = p[2];
  -   ecc_code[1] = p[1];
  +   ecc_code[0] = p[1];
  +   ecc_code[1] = p[2];
 ecc_code[2] = p[3];
 
 return 0;
 
  Does anybody see a problem with my method of reproducing the bug? This
  bug is deadly for our customers. I don't want to make the change unless
  it is absolutely necessary..

 Just one question: did you enabled MTD_NAND_ECC_SMC in configs?

Yes, MTD_NAND_ECC_SMC is selected via Kconfig for this driver.

Cheers,
Stefan

--
DENX Software Engineering GmbH,  MD: Wolfgang Denk  Detlev Zundel
HRB 165235 Munich,  Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-0 Fax: (+49)-8142-66989-80 Email: off...@denx.de
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [U-Boot] NAND ECC Error with wrong SMC ording bug

2009-08-21 Thread Victor Gallardo
Hi Vimal,

  With the current ndfc code, the error correction gets the bits wrong.
  Switching it back to the original way and the correction is correct.
 
  diff --git a/drivers/mtd/nand/ndfc.c b/drivers/mtd/nand/ndfc.c
  index 89bf85a..497e175 100644
  --- a/drivers/mtd/nand/ndfc.c
  +++ b/drivers/mtd/nand/ndfc.c
  @@ -101,9 +101,8 @@ static int ndfc_calculate_ecc(struct mtd_info *mtd,
 
         wmb();
         ecc = in_be32(ndfc-ndfcbase + NDFC_ECC);
  -       /* The NDFC uses Smart Media (SMC) bytes order */
  -       ecc_code[0] = p[2];
  -       ecc_code[1] = p[1];
  +       ecc_code[0] = p[1];
  +       ecc_code[1] = p[2];
         ecc_code[2] = p[3];
 
         return 0;
 
  Does anybody see a problem with my method of reproducing the bug? This
  bug is deadly for our customers. I don't want to make the change unless
  it is absolutely necessary..
 
 Just one question: did you enabled MTD_NAND_ECC_SMC in configs?

Yes, it was set.

Best Regards,

Victor Gallardo
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [U-Boot] NAND ECC Error with wrong SMC ording bug

2009-08-21 Thread Stefan Roese
Hi Feng,

On Friday 21 August 2009 01:42:42 Feng Kan wrote:
 We had a board with high number of correctable ECC errors. Which crashed
 the jffs when it
 was miss correcting the wrong byte location.

OK, thanks.

 Do you want me to submit a patch for this, or do you prefer to do it.

Sure, please go ahead and send a patch to fix this in U-Boot as well.

Thanks.

Cheers,
Stefan

--
DENX Software Engineering GmbH,  MD: Wolfgang Denk  Detlev Zundel
HRB 165235 Munich,  Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-0 Fax: (+49)-8142-66989-80 Email: off...@denx.de
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [U-Boot] NAND ECC Error with wrong SMC ording bug

2009-08-20 Thread Sean MacLennan
On Thu, 20 Aug 2009 07:01:21 +0200
Stefan Roese s...@denx.de wrote:

 On Thursday 20 August 2009 06:38:51 Sean MacLennan wrote:
   I see other boards using SMC as well, can someone comment on the
   change I am proposing.
   Should I change the correction algorithm or the calculate
   function? If the later is preferred
   it would mean the change must be pushed in both U-Boot and Linux.
 
  Odds are the calculate function is wrong. The correction algo is
  used by many nand drivers, I *assume* it is correct. The calculate
  function was set to agree with u-boot (1.3.0).
 
 Yes, it seems that you changed the order in the calculation function
 while reworking the NDFC driver for arch/powerpc. So we should
 probably change this order back to the original version. And change
 it in U-Boot as well.
 
 BTW: I didn't see any problems with ECC so far with the current code.
 Feng, how did you spot this problem?

Ok, I think I have reproduced the problem programmatically. Basically,
I force a one bit error with the following patch:

diff --git a/drivers/mtd/nand/nand_base.c b/drivers/mtd/nand/nand_base.c
index 8c21b89..91dd5b4 100644
--- a/drivers/mtd/nand/nand_base.c
+++ b/drivers/mtd/nand/nand_base.c
@@ -1628,11 +1628,22 @@ static void nand_write_page_hwecc(struct mtd_info *mtd, 
struct nand_chip *chip,
uint8_t *ecc_calc = chip-buffers-ecccalc;
const uint8_t *p = buf;
uint32_t *eccpos = chip-ecc.layout-eccpos;
+   static int count;
 
for (i = 0; eccsteps; eccsteps--, i += eccbytes, p += eccsize) {
chip-ecc.hwctl(mtd, NAND_ECC_WRITE);
-   chip-write_buf(mtd, p, eccsize);
-   chip-ecc.calculate(mtd, p, ecc_calc[i]);
+   if (count == 0) {
+   count = 1;
+   printk(Corrupt one bit: %08x = %08x\n,
+  *p, *p ^ 8);
+   *(uint8_t *)p ^= 8;
+   chip-write_buf(mtd, p, eccsize);
+   *(uint8_t *)p ^= 8;
+   nand_calculate_ecc(mtd, p, ecc_calc[i]);
+   } else {
+   chip-write_buf(mtd, p, eccsize);
+   chip-ecc.calculate(mtd, p, ecc_calc[i]);
+   }
}
 
for (i = 0; i  chip-ecc.total; i++)

Basically I write a one bit error to the NAND, but calculate with the
correct bit. This assumes nand_calculate_ecc is correct.

I then added debugs to the correction to make sure it corrected
properly:

diff --git a/drivers/mtd/nand/nand_ecc.c b/drivers/mtd/nand/nand_ecc.c
index c0cb87d..57dcaa1 100644
--- a/drivers/mtd/nand/nand_ecc.c
+++ b/drivers/mtd/nand/nand_ecc.c
@@ -483,14 +483,20 @@ int nand_correct_data(struct mtd_info *mtd, unsigned char 
*buf,
byte_addr = (addressbits[b2  0x3]  8) +
(addressbits[b1]  4) + addressbits[b0];
bit_addr = addressbits[b2  2];
+
+   printk(Single bit error: correct %08x = %08x\n,
+  buf[byte_addr], buf[byte_addr] ^ (1  bit_addr));
+
/* flip the bit */
buf[byte_addr] ^= (1  bit_addr);
return 1;
 
}
/* count nr of bits; use table lookup, faster than calculating it */
-   if ((bitsperbyte[b0] + bitsperbyte[b1] + bitsperbyte[b2]) == 1)
+   if ((bitsperbyte[b0] + bitsperbyte[b1] + bitsperbyte[b2]) == 1) {
+   printk(ECC DATA BAD\n); // SAM DBG
return 1;   /* error in ecc data; no action needed */
+   }
 
printk(KERN_ERR uncorrectable error : );
return -1;

With the current ndfc code, the error correction gets the bits wrong.
Switching it back to the original way and the correction is correct.

diff --git a/drivers/mtd/nand/ndfc.c b/drivers/mtd/nand/ndfc.c
index 89bf85a..497e175 100644
--- a/drivers/mtd/nand/ndfc.c
+++ b/drivers/mtd/nand/ndfc.c
@@ -101,9 +101,8 @@ static int ndfc_calculate_ecc(struct mtd_info *mtd,
 
wmb();
ecc = in_be32(ndfc-ndfcbase + NDFC_ECC);
-   /* The NDFC uses Smart Media (SMC) bytes order */
-   ecc_code[0] = p[2];
-   ecc_code[1] = p[1];
+   ecc_code[0] = p[1];
+   ecc_code[1] = p[2];
ecc_code[2] = p[3];
 
return 0;

Does anybody see a problem with my method of reproducing the bug? This
bug is deadly for our customers. I don't want to make the change unless
it is absolutely necessary.

Cheers,
   Sean
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


RE: [U-Boot] NAND ECC Error with wrong SMC ording bug

2009-08-20 Thread Victor Gallardo
Hi Sean,

The change is necessary in both Linux and u-boot. Without this change customer 
are seeing the problem.

Best Regards,

Victor Gallardo

 -Original Message-
 From: linuxppc-dev-bounces+vgallardo=amcc@lists.ozlabs.org 
 [mailto:linuxppc-dev-
 bounces+vgallardo=amcc@lists.ozlabs.org] On Behalf Of Sean MacLennan
 Sent: Thursday, August 20, 2009 12:37 PM
 To: Stefan Roese
 Cc: u-b...@lists.denx.de; Feng Kan; linux-...@lists.infradead.org; 
 linuxppc-...@ozlabs.org
 Subject: Re: [U-Boot] NAND ECC Error with wrong SMC ording bug
 
 On Thu, 20 Aug 2009 07:01:21 +0200
 Stefan Roese s...@denx.de wrote:
 
  On Thursday 20 August 2009 06:38:51 Sean MacLennan wrote:
I see other boards using SMC as well, can someone comment on the
change I am proposing.
Should I change the correction algorithm or the calculate
function? If the later is preferred
it would mean the change must be pushed in both U-Boot and Linux.
  
   Odds are the calculate function is wrong. The correction algo is
   used by many nand drivers, I *assume* it is correct. The calculate
   function was set to agree with u-boot (1.3.0).
 
  Yes, it seems that you changed the order in the calculation function
  while reworking the NDFC driver for arch/powerpc. So we should
  probably change this order back to the original version. And change
  it in U-Boot as well.
 
  BTW: I didn't see any problems with ECC so far with the current code.
  Feng, how did you spot this problem?
 
 Ok, I think I have reproduced the problem programmatically. Basically,
 I force a one bit error with the following patch:
 
 diff --git a/drivers/mtd/nand/nand_base.c b/drivers/mtd/nand/nand_base.c
 index 8c21b89..91dd5b4 100644
 --- a/drivers/mtd/nand/nand_base.c
 +++ b/drivers/mtd/nand/nand_base.c
 @@ -1628,11 +1628,22 @@ static void nand_write_page_hwecc(struct mtd_info 
 *mtd, struct nand_chip
 *chip,
   uint8_t *ecc_calc = chip-buffers-ecccalc;
   const uint8_t *p = buf;
   uint32_t *eccpos = chip-ecc.layout-eccpos;
 + static int count;
 
   for (i = 0; eccsteps; eccsteps--, i += eccbytes, p += eccsize) {
   chip-ecc.hwctl(mtd, NAND_ECC_WRITE);
 - chip-write_buf(mtd, p, eccsize);
 - chip-ecc.calculate(mtd, p, ecc_calc[i]);
 + if (count == 0) {
 + count = 1;
 + printk(Corrupt one bit: %08x = %08x\n,
 +*p, *p ^ 8);
 + *(uint8_t *)p ^= 8;
 + chip-write_buf(mtd, p, eccsize);
 + *(uint8_t *)p ^= 8;
 + nand_calculate_ecc(mtd, p, ecc_calc[i]);
 + } else {
 + chip-write_buf(mtd, p, eccsize);
 + chip-ecc.calculate(mtd, p, ecc_calc[i]);
 + }
   }
 
   for (i = 0; i  chip-ecc.total; i++)
 
 Basically I write a one bit error to the NAND, but calculate with the
 correct bit. This assumes nand_calculate_ecc is correct.
 
 I then added debugs to the correction to make sure it corrected
 properly:
 
 diff --git a/drivers/mtd/nand/nand_ecc.c b/drivers/mtd/nand/nand_ecc.c
 index c0cb87d..57dcaa1 100644
 --- a/drivers/mtd/nand/nand_ecc.c
 +++ b/drivers/mtd/nand/nand_ecc.c
 @@ -483,14 +483,20 @@ int nand_correct_data(struct mtd_info *mtd, unsigned 
 char *buf,
   byte_addr = (addressbits[b2  0x3]  8) +
   (addressbits[b1]  4) + addressbits[b0];
   bit_addr = addressbits[b2  2];
 +
 + printk(Single bit error: correct %08x = %08x\n,
 +buf[byte_addr], buf[byte_addr] ^ (1  bit_addr));
 +
   /* flip the bit */
   buf[byte_addr] ^= (1  bit_addr);
   return 1;
 
   }
   /* count nr of bits; use table lookup, faster than calculating it */
 - if ((bitsperbyte[b0] + bitsperbyte[b1] + bitsperbyte[b2]) == 1)
 + if ((bitsperbyte[b0] + bitsperbyte[b1] + bitsperbyte[b2]) == 1) {
 + printk(ECC DATA BAD\n); // SAM DBG
   return 1;   /* error in ecc data; no action needed */
 + }
 
   printk(KERN_ERR uncorrectable error : );
   return -1;
 
 With the current ndfc code, the error correction gets the bits wrong.
 Switching it back to the original way and the correction is correct.
 
 diff --git a/drivers/mtd/nand/ndfc.c b/drivers/mtd/nand/ndfc.c
 index 89bf85a..497e175 100644
 --- a/drivers/mtd/nand/ndfc.c
 +++ b/drivers/mtd/nand/ndfc.c
 @@ -101,9 +101,8 @@ static int ndfc_calculate_ecc(struct mtd_info *mtd,
 
   wmb();
   ecc = in_be32(ndfc-ndfcbase + NDFC_ECC);
 - /* The NDFC uses Smart Media (SMC) bytes order */
 - ecc_code[0] = p[2];
 - ecc_code[1] = p[1];
 + ecc_code[0] = p[1];
 + ecc_code[1] = p[2];
   ecc_code[2] = p[3];
 
   return 0;
 
 Does anybody see a problem with my method of reproducing the bug? This
 bug is deadly for our customers. I don't want to make

Re: [U-Boot] NAND ECC Error with wrong SMC ording bug

2009-08-20 Thread Feng Kan

Hi Stefan:

We had a board with high number of correctable ECC errors. Which crashed 
the jffs when it

was miss correcting the wrong byte location.

Do you want me to submit a patch for this, or do you prefer to do it. I 
am submitting a patch

for linux right now.

Feng Kan
AMCC Software

On 08/19/2009 10:01 PM, Stefan Roese wrote:

On Thursday 20 August 2009 06:38:51 Sean MacLennan wrote:
   

I see other boards using SMC as well, can someone comment on the
change I am proposing.
Should I change the correction algorithm or the calculate function?
If the later is preferred
it would mean the change must be pushed in both U-Boot and Linux.
   

Odds are the calculate function is wrong. The correction algo is used
by many nand drivers, I *assume* it is correct. The calculate function
was set to agree with u-boot (1.3.0).
 

Yes, it seems that you changed the order in the calculation function while
reworking the NDFC driver for arch/powerpc. So we should probably change this
order back to the original version. And change it in U-Boot as well.

BTW: I didn't see any problems with ECC so far with the current code. Feng,
how did you spot this problem?

Cheers,
Stefan

--
DENX Software Engineering GmbH,  MD: Wolfgang Denk  Detlev Zundel
HRB 165235 Munich,  Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-0 Fax: (+49)-8142-66989-80 Email: off...@denx.de
   


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [U-Boot] NAND ECC Error with wrong SMC ording bug

2009-08-20 Thread vimal singh
snip

 With the current ndfc code, the error correction gets the bits wrong.
 Switching it back to the original way and the correction is correct.

 diff --git a/drivers/mtd/nand/ndfc.c b/drivers/mtd/nand/ndfc.c
 index 89bf85a..497e175 100644
 --- a/drivers/mtd/nand/ndfc.c
 +++ b/drivers/mtd/nand/ndfc.c
 @@ -101,9 +101,8 @@ static int ndfc_calculate_ecc(struct mtd_info *mtd,

        wmb();
        ecc = in_be32(ndfc-ndfcbase + NDFC_ECC);
 -       /* The NDFC uses Smart Media (SMC) bytes order */
 -       ecc_code[0] = p[2];
 -       ecc_code[1] = p[1];
 +       ecc_code[0] = p[1];
 +       ecc_code[1] = p[2];
        ecc_code[2] = p[3];

        return 0;

 Does anybody see a problem with my method of reproducing the bug? This
 bug is deadly for our customers. I don't want to make the change unless
 it is absolutely necessary..

Just one question: did you enabled MTD_NAND_ECC_SMC in configs?

-vimal


 Cheers,
   Sean

 __
 Linux MTD discussion mailing list
 http://lists.infradead.org/mailman/listinfo/linux-mtd/

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [U-Boot] NAND ECC Error with wrong SMC ording bug

2009-08-19 Thread Stefan Roese
On Thursday 20 August 2009 06:38:51 Sean MacLennan wrote:
  I see other boards using SMC as well, can someone comment on the
  change I am proposing.
  Should I change the correction algorithm or the calculate function?
  If the later is preferred
  it would mean the change must be pushed in both U-Boot and Linux.

 Odds are the calculate function is wrong. The correction algo is used
 by many nand drivers, I *assume* it is correct. The calculate function
 was set to agree with u-boot (1.3.0).

Yes, it seems that you changed the order in the calculation function while 
reworking the NDFC driver for arch/powerpc. So we should probably change this 
order back to the original version. And change it in U-Boot as well.

BTW: I didn't see any problems with ECC so far with the current code. Feng, 
how did you spot this problem?

Cheers,
Stefan

--
DENX Software Engineering GmbH,  MD: Wolfgang Denk  Detlev Zundel
HRB 165235 Munich,  Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-0 Fax: (+49)-8142-66989-80 Email: off...@denx.de
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev