RE: MPC831x (and others?) NAND erase performance improvements

2010-12-13 Thread David Laight
 
  An external IRQ line would let you limit interrupts to rising edges
  rather than all edges, though you'd lose the ability to 
  directly read the line status.
 
 oh, one cannot read the IRQ line? didn't know that. Also I not sure
 all Freescale CPUs can do rising edge.

I suspect that you may be able to leave the interupt masked, but still
read the 'interrupt pending' register. Which would have the same effect.

Our HW engineers tend to feed everything into an FPGA since it
gives than a lot more flexibility over pin connections.
In which case the invertor is trivial.
(and the fpga interface can read the status!)

David


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


RE: MPC831x (and others?) NAND erase performance improvements

2010-12-13 Thread Joakim Tjernlund
David Laight david.lai...@aculab.com wrote on 2010/12/13 09:33:37:


   An external IRQ line would let you limit interrupts to rising edges
   rather than all edges, though you'd lose the ability to
   directly read the line status.
 
  oh, one cannot read the IRQ line? didn't know that. Also I not sure
  all Freescale CPUs can do rising edge.

 I suspect that you may be able to leave the interupt masked, but still
 read the 'interrupt pending' register. Which would have the same effect.

Ah, that should work too. I should be able to read the 'interrupt pending'
register at all times, even when it isn't masked.

What if one has several NAND chips to build a big FS? Is the NAND
controller equipped to handle that?


 Our HW engineers tend to feed everything into an FPGA since it
 gives than a lot more flexibility over pin connections.
 In which case the invertor is trivial.
 (and the fpga interface can read the status!)

Yes, but not all of our boards have FPGA and we load the FPGA
from the SW so it is a chicken and egg problem for us.

 Jocke

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: MPC831x (and others?) NAND erase performance improvements

2010-12-13 Thread Scott Wood
On Mon, 13 Dec 2010 11:32:00 +0100
Joakim Tjernlund joakim.tjernl...@transmode.se wrote:

 David Laight david.lai...@aculab.com wrote on 2010/12/13 09:33:37:
 
 
An external IRQ line would let you limit interrupts to rising edges
rather than all edges, though you'd lose the ability to
directly read the line status.
  
   oh, one cannot read the IRQ line? didn't know that.
   Also I not sure all Freescale CPUs can do rising edge.

Ah right, 83xx has IPIC rather than MPIC.

  I suspect that you may be able to leave the interupt masked, but still
  read the 'interrupt pending' register. Which would have the same effect.

 Ah, that should work too. I should be able to read the 'interrupt pending'
 register at all times, even when it isn't masked.

This could work OK if you have board logic to invert the signal.

 What if one has several NAND chips to build a big FS? Is the NAND
 controller equipped to handle that?

FCM can drive one NAND chip per eLBC chipselect, though possibly you
could go beyond that with a board-logic chipselect mechanism.

-Scott

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: MPC831x (and others?) NAND erase performance improvements

2010-12-13 Thread Joakim Tjernlund
Scott Wood scottw...@freescale.com wrote on 2010/12/13 18:33:56:

 On Mon, 13 Dec 2010 11:32:00 +0100
 Joakim Tjernlund joakim.tjernl...@transmode.se wrote:

  David Laight david.lai...@aculab.com wrote on 2010/12/13 09:33:37:
  
  
 An external IRQ line would let you limit interrupts to rising edges
 rather than all edges, though you'd lose the ability to
 directly read the line status.
   
oh, one cannot read the IRQ line? didn't know that.
Also I not sure all Freescale CPUs can do rising edge.

 Ah right, 83xx has IPIC rather than MPIC.

   I suspect that you may be able to leave the interupt masked, but still
   read the 'interrupt pending' register. Which would have the same effect.
 
  Ah, that should work too. I should be able to read the 'interrupt pending'
  register at all times, even when it isn't masked.

 This could work OK if you have board logic to invert the signal.

yeah, just a NAND gate :)


  What if one has several NAND chips to build a big FS? Is the NAND
  controller equipped to handle that?

 FCM can drive one NAND chip per eLBC chipselect, though possibly you
 could go beyond that with a board-logic chipselect mechanism.

hmm, then I guess one would have to use one GPIO/IRQ per NAND chip?

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: MPC831x (and others?) NAND erase performance improvements

2010-12-13 Thread Scott Wood
On Mon, 13 Dec 2010 18:41:32 +0100
Joakim Tjernlund joakim.tjernl...@transmode.se wrote:

 Scott Wood scottw...@freescale.com wrote on 2010/12/13 18:33:56:
 
  On Mon, 13 Dec 2010 11:32:00 +0100
  Joakim Tjernlund joakim.tjernl...@transmode.se wrote:
 
   What if one has several NAND chips to build a big FS? Is the NAND
   controller equipped to handle that?
 
  FCM can drive one NAND chip per eLBC chipselect, though possibly you
  could go beyond that with a board-logic chipselect mechanism.
 
 hmm, then I guess one would have to use one GPIO/IRQ per NAND chip?

Couldn't you just tie together all the open-drain busy lines before you
invert it?  You'll only be driving one NAND chip at a time anyway; the
others should not be asserting busy.

-Scott

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: MPC831x (and others?) NAND erase performance improvements

2010-12-13 Thread Joakim Tjernlund
Scott Wood scottw...@freescale.com wrote on 2010/12/13 18:51:31:

 On Mon, 13 Dec 2010 18:41:32 +0100
 Joakim Tjernlund joakim.tjernl...@transmode.se wrote:

  Scott Wood scottw...@freescale.com wrote on 2010/12/13 18:33:56:
  
   On Mon, 13 Dec 2010 11:32:00 +0100
   Joakim Tjernlund joakim.tjernl...@transmode.se wrote:
  
What if one has several NAND chips to build a big FS? Is the NAND
controller equipped to handle that?
  
   FCM can drive one NAND chip per eLBC chipselect, though possibly you
   could go beyond that with a board-logic chipselect mechanism.
 
  hmm, then I guess one would have to use one GPIO/IRQ per NAND chip?

 Couldn't you just tie together all the open-drain busy lines before you
 invert it?  You'll only be driving one NAND chip at a time anyway; the
 others should not be asserting busy.

hmm, I guess that would work(didn't know they were open-drain), thanks.
Is that how the FCM do it?

 Jocke

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: MPC831x (and others?) NAND erase performance improvements

2010-12-13 Thread Scott Wood
On Mon, 13 Dec 2010 20:30:27 +0100
Joakim Tjernlund joakim.tjernl...@transmode.se wrote:

 Scott Wood scottw...@freescale.com wrote on 2010/12/13 18:51:31:
 
  On Mon, 13 Dec 2010 18:41:32 +0100
  Joakim Tjernlund joakim.tjernl...@transmode.se wrote:
 
   Scott Wood scottw...@freescale.com wrote on 2010/12/13 18:33:56:
   
On Mon, 13 Dec 2010 11:32:00 +0100
Joakim Tjernlund joakim.tjernl...@transmode.se wrote:
   
 What if one has several NAND chips to build a big FS? Is the NAND
 controller equipped to handle that?
   
FCM can drive one NAND chip per eLBC chipselect, though possibly you
could go beyond that with a board-logic chipselect mechanism.
  
   hmm, then I guess one would have to use one GPIO/IRQ per NAND chip?
 
  Couldn't you just tie together all the open-drain busy lines before you
  invert it?  You'll only be driving one NAND chip at a time anyway; the
  others should not be asserting busy.
 
 hmm, I guess that would work(didn't know they were open-drain), thanks.
 Is that how the FCM do it?

Yes, that's what started this discussion. :-)

The problem there is that they share the line with all chipselects,
NAND or otherwise.

-Scott

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: MPC831x (and others?) NAND erase performance improvements

2010-12-13 Thread Joakim Tjernlund
Scott Wood scottw...@freescale.com wrote on 2010/12/13 20:49:50:

 On Mon, 13 Dec 2010 20:30:27 +0100
 Joakim Tjernlund joakim.tjernl...@transmode.se wrote:

  Scott Wood scottw...@freescale.com wrote on 2010/12/13 18:51:31:
  
   On Mon, 13 Dec 2010 18:41:32 +0100
   Joakim Tjernlund joakim.tjernl...@transmode.se wrote:
  
Scott Wood scottw...@freescale.com wrote on 2010/12/13 18:33:56:

 On Mon, 13 Dec 2010 11:32:00 +0100
 Joakim Tjernlund joakim.tjernl...@transmode.se wrote:

  What if one has several NAND chips to build a big FS? Is the NAND
  controller equipped to handle that?

 FCM can drive one NAND chip per eLBC chipselect, though possibly you
 could go beyond that with a board-logic chipselect mechanism.
   
hmm, then I guess one would have to use one GPIO/IRQ per NAND chip?
  
   Couldn't you just tie together all the open-drain busy lines before you
   invert it?  You'll only be driving one NAND chip at a time anyway; the
   others should not be asserting busy.
 
  hmm, I guess that would work(didn't know they were open-drain), thanks.
  Is that how the FCM do it?

 Yes, that's what started this discussion. :-)

True, I must be getting old :)


 The problem there is that they share the line with all chipselects,
 NAND or otherwise.

Right, thanks for reminding me.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: MPC831x (and others?) NAND erase performance improvements

2010-12-11 Thread Joakim Tjernlund
Scott Wood scottw...@freescale.com wrote on 2010/12/10 18:56:39:

 On Fri, 10 Dec 2010 13:39:01 +0100
 Joakim Tjernlund joakim.tjernl...@transmode.se wrote:

  Scott Wood scottw...@freescale.com wrote on 2010/12/08 23:25:59:
  
   On Wed, 8 Dec 2010 17:02:45 -0500
   Mark Mason ma...@postdiluvian.org wrote:
  
I don't think that using a software NAND controller instead of the LBC
FCM mode is all that bad.  Again, I haven't actually done it, so check
the MTD docs, but I'm pretty sure the software is meant to do that, so
it doesn't even really constitute a fix.  Assuming that it is
supported then I doubt that configuring the NAND layer to use your
setup would be any harder than configuring the FCM.
  
   The MTD layer supports some really simple NAND controllers, but what do
   you mean by not having a controller at all?  Hooking everything up to
   GPIO?  Using UPM?
  
   There is already a UPM NAND driver, BTW.
  
   You would lose hardware ECC and the ability to be interrupt-driven (the
   latter should be possible with SW changes, using GPIO interrupts).
 
  hmm, you think it would be possible to use one of the IRQ pins instead?

 GPIO should be fine, software just needs to be changed to use the
 interrupt functionality.

 An external IRQ line would let you limit interrupts to rising edges
 rather than all edges, though you'd lose the ability to directly read
 the line status.

oh, one cannot read the IRQ line? didn't know that. Also I not sure
all Freescale CPUs can do rising edge.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: MPC831x (and others?) NAND erase performance improvements

2010-12-10 Thread Andre Schwarz

Scott,

do you think this issue also applies to MPC8377 ?

I'm in the middle of a small redesign for series production and would 
like not to miss a thing.

We have Nand, Nor and MRAM connected to LBC.

Since RFS is running from NAND and we use the MRAM as a non-volatile 
SRAM I'd like to avoid being hit by this issue.


Any comments from your side ?

Regards,
André


On Wed, 8 Dec 2010 22:26:59 +0100
Joakim Tjernlundjoakim.tjernl...@transmode.se  wrote:


Scott Woodscottw...@freescale.com  wrote on 2010/12/08 21:25:51:

On Wed, 8 Dec 2010 21:11:08 +0100
Joakim Tjernlundjoakim.tjernl...@transmode.se  wrote:


Scott Woodscottw...@freescale.com  wrote on 2010/12/08 20:59:28:

On Wed, 8 Dec 2010 20:57:03 +0100
Joakim Tjernlundjoakim.tjernl...@transmode.se  wrote:


Can you think of any workaround such as not connecting the BUSY pin at all?

Maybe connect the busy pin to a gpio?

Is BUSY required for sane operation or it an optimization?

You could probably get away without it by inserting delays if you know
the chip specs well enough.

Urgh, that does not feel like a good solution.

No, but you asked if it could be done, and if it was just a
performance issue. :-)


Is there any risk that the NAND device will drive the LB and corrupt
the bus for other devices?

I think the only thing the NAND chip should be driving is the busy pin,

OK, good. What function is actually lost if one uses an GPIO instead of
BUSY?

Not much, if you enable interrupts on the GPIO pin.  The driver would
have to be reworked a bit, of course.


You think Freescale could test and validate a GPIO solution? I don't
think we will be very happy to design our board around an unproven
workaround.

Ask your sales/support contacts.


An even better workaround would be if one could add logic between the
NAND and the CPU which would compensate for this defect without needing
special SW fixes.

The problem with that is when would you assert the chipselect again to
check if it's done?  Current SW depends on being able to tell the LBC
to interrupt (or take other action) when busy goes away.

I suppose you could poll with status reads, which could at least be
preempted if you've got something higher priority to do with the LBC.

-Scott

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev



MATRIX VISION GmbH, Talstrasse 16, DE-71570 Oppenweiler
Registergericht: Amtsgericht Stuttgart, HRB 271090
Geschaeftsfuehrer: Gerhard Thullner, Werner Armingeon, Uwe Furtner
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: MPC831x (and others?) NAND erase performance improvements

2010-12-10 Thread Joakim Tjernlund
Andre Schwarz andre.schw...@matrix-vision.de wrote on 2010/12/10 09:47:10:

 Scott,

 do you think this issue also applies to MPC8377 ?

Probably, I think this is so for all eLBC controllers.


 I'm in the middle of a small redesign for series production and would
 like not to miss a thing.
 We have Nand, Nor and MRAM connected to LBC.

 Since RFS is running from NAND and we use the MRAM as a non-volatile
 SRAM I'd like to avoid being hit by this issue.

Please report back, I really want to know if this works and if there
are any drawbacks.


 Any comments from your side ?

 Regards,
 André

  On Wed, 8 Dec 2010 22:26:59 +0100
  Joakim Tjernlundjoakim.tjernl...@transmode.se  wrote:
 
  Scott Woodscottw...@freescale.com  wrote on 2010/12/08 21:25:51:
  On Wed, 8 Dec 2010 21:11:08 +0100
  Joakim Tjernlundjoakim.tjernl...@transmode.se  wrote:
 
  Scott Woodscottw...@freescale.com  wrote on 2010/12/08 20:59:28:
  On Wed, 8 Dec 2010 20:57:03 +0100
  Joakim Tjernlundjoakim.tjernl...@transmode.se  wrote:
 
  Can you think of any workaround such as not connecting the BUSY pin at 
  all?
  Maybe connect the busy pin to a gpio?
  Is BUSY required for sane operation or it an optimization?
  You could probably get away without it by inserting delays if you know
  the chip specs well enough.
  Urgh, that does not feel like a good solution.
  No, but you asked if it could be done, and if it was just a
  performance issue. :-)
 
  Is there any risk that the NAND device will drive the LB and corrupt
  the bus for other devices?
  I think the only thing the NAND chip should be driving is the busy pin,
  OK, good. What function is actually lost if one uses an GPIO instead of
  BUSY?
  Not much, if you enable interrupts on the GPIO pin.  The driver would
  have to be reworked a bit, of course.
 
  You think Freescale could test and validate a GPIO solution? I don't
  think we will be very happy to design our board around an unproven
  workaround.
  Ask your sales/support contacts.
 
  An even better workaround would be if one could add logic between the
  NAND and the CPU which would compensate for this defect without needing
  special SW fixes.
  The problem with that is when would you assert the chipselect again to
  check if it's done?  Current SW depends on being able to tell the LBC
  to interrupt (or take other action) when busy goes away.
 
  I suppose you could poll with status reads, which could at least be
  preempted if you've got something higher priority to do with the LBC.
 
  -Scott
 
  ___
  Linuxppc-dev mailing list
  Linuxppc-dev@lists.ozlabs.org
  https://lists.ozlabs.org/listinfo/linuxppc-dev


 MATRIX VISION GmbH, Talstrasse 16, DE-71570 Oppenweiler
 Registergericht: Amtsgericht Stuttgart, HRB 271090
 Geschaeftsfuehrer: Gerhard Thullner, Werner Armingeon, Uwe Furtner

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: MPC831x (and others?) NAND erase performance improvements

2010-12-10 Thread Joakim Tjernlund
Scott Wood scottw...@freescale.com wrote on 2010/12/08 23:25:59:

 On Wed, 8 Dec 2010 17:02:45 -0500
 Mark Mason ma...@postdiluvian.org wrote:

  I don't think that using a software NAND controller instead of the LBC
  FCM mode is all that bad.  Again, I haven't actually done it, so check
  the MTD docs, but I'm pretty sure the software is meant to do that, so
  it doesn't even really constitute a fix.  Assuming that it is
  supported then I doubt that configuring the NAND layer to use your
  setup would be any harder than configuring the FCM.

 The MTD layer supports some really simple NAND controllers, but what do
 you mean by not having a controller at all?  Hooking everything up to
 GPIO?  Using UPM?

 There is already a UPM NAND driver, BTW.

 You would lose hardware ECC and the ability to be interrupt-driven (the
 latter should be possible with SW changes, using GPIO interrupts).

hmm, you think it would be possible to use one of the IRQ pins instead?

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: MPC831x (and others?) NAND erase performance improvements

2010-12-10 Thread Scott Wood
On Fri, 10 Dec 2010 13:39:01 +0100
Joakim Tjernlund joakim.tjernl...@transmode.se wrote:

 Scott Wood scottw...@freescale.com wrote on 2010/12/08 23:25:59:
 
  On Wed, 8 Dec 2010 17:02:45 -0500
  Mark Mason ma...@postdiluvian.org wrote:
 
   I don't think that using a software NAND controller instead of the LBC
   FCM mode is all that bad.  Again, I haven't actually done it, so check
   the MTD docs, but I'm pretty sure the software is meant to do that, so
   it doesn't even really constitute a fix.  Assuming that it is
   supported then I doubt that configuring the NAND layer to use your
   setup would be any harder than configuring the FCM.
 
  The MTD layer supports some really simple NAND controllers, but what do
  you mean by not having a controller at all?  Hooking everything up to
  GPIO?  Using UPM?
 
  There is already a UPM NAND driver, BTW.
 
  You would lose hardware ECC and the ability to be interrupt-driven (the
  latter should be possible with SW changes, using GPIO interrupts).
 
 hmm, you think it would be possible to use one of the IRQ pins instead?

GPIO should be fine, software just needs to be changed to use the
interrupt functionality.

An external IRQ line would let you limit interrupts to rising edges
rather than all edges, though you'd lose the ability to directly read
the line status.

-Scott

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: MPC831x (and others?) NAND erase performance improvements

2010-12-08 Thread Joakim Tjernlund

 On Mon, 6 Dec 2010 22:15:54 -0500
 Mark Mason ma...@postdiluvian.org wrote:

  A few months ago I ran into some performance problems involving
  UBI/NAND erases holding other devices off the LBC on an MPC8315.  I
  found a solution for this, which worked well, at least with the
  hardware I was working with.  I suspect the same problem affects other
  PPCs, probably including multicore devices, and maybe other
  architectures as well.
 
  I don't have experience with similar NAND controllers on other
  devices, so I'd like to explain what I found and see if someone who's
  more familiar with the family and/or driver can tell if this is
  useful.
 
  The problem cropped up when there was a lot of traffic to the NAND
  (Samsung K9WAGU08U1B-PIB0), with the NAND being on the LBC along with
  a video chip that needed constant and prompt attention.

 If you attach NAND to the LBC, you should not attach anything else to
 it which is latency-sensitive.

This feature makes the LBC useless to us. Is there some workaround or plan
to address this limitation?

  Jocke

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: MPC831x (and others?) NAND erase performance improvements

2010-12-08 Thread Scott Wood
On Wed, 8 Dec 2010 08:59:49 +0100
Joakim Tjernlund joakim.tjernl...@transmode.se wrote:

 
  On Mon, 6 Dec 2010 22:15:54 -0500
  Mark Mason ma...@postdiluvian.org wrote:
 
   A few months ago I ran into some performance problems involving
   UBI/NAND erases holding other devices off the LBC on an MPC8315.  I
   found a solution for this, which worked well, at least with the
   hardware I was working with.  I suspect the same problem affects other
   PPCs, probably including multicore devices, and maybe other
   architectures as well.
  
   I don't have experience with similar NAND controllers on other
   devices, so I'd like to explain what I found and see if someone who's
   more familiar with the family and/or driver can tell if this is
   useful.
  
   The problem cropped up when there was a lot of traffic to the NAND
   (Samsung K9WAGU08U1B-PIB0), with the NAND being on the LBC along with
   a video chip that needed constant and prompt attention.
 
  If you attach NAND to the LBC, you should not attach anything else to
  it which is latency-sensitive.
 
 This feature makes the LBC useless to us. Is there some workaround or plan
 to address this limitation?

Complain to your support or sales contact.

I've complained about it in the past, and got a but pins are a limited
resource! response.  They need to hear that it's a problem from
customers.

-Scott

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: MPC831x (and others?) NAND erase performance improvements

2010-12-08 Thread Joakim Tjernlund
Scott Wood scottw...@freescale.com wrote on 2010/12/08 18:18:39:

 On Wed, 8 Dec 2010 08:59:49 +0100
 Joakim Tjernlund joakim.tjernl...@transmode.se wrote:

  
   On Mon, 6 Dec 2010 22:15:54 -0500
   Mark Mason ma...@postdiluvian.org wrote:
  
A few months ago I ran into some performance problems involving
UBI/NAND erases holding other devices off the LBC on an MPC8315.  I
found a solution for this, which worked well, at least with the
hardware I was working with.  I suspect the same problem affects other
PPCs, probably including multicore devices, and maybe other
architectures as well.
   
I don't have experience with similar NAND controllers on other
devices, so I'd like to explain what I found and see if someone who's
more familiar with the family and/or driver can tell if this is
useful.
   
The problem cropped up when there was a lot of traffic to the NAND
(Samsung K9WAGU08U1B-PIB0), with the NAND being on the LBC along with
a video chip that needed constant and prompt attention.
  
   If you attach NAND to the LBC, you should not attach anything else to
   it which is latency-sensitive.
 
  This feature makes the LBC useless to us. Is there some workaround or plan
  to address this limitation?

 Complain to your support or sales contact.

 I've complained about it in the past, and got a but pins are a limited
 resource! response.  They need to hear that it's a problem from
 customers.

Done, lets see what I get in return. I think this problem will be
a major obstacle for our next generation boards which will be NAND
based.

   Jocke

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: MPC831x (and others?) NAND erase performance improvements

2010-12-08 Thread Mark Mason
Joakim Tjernlund joakim.tjernl...@transmode.se wrote:

 Scott Wood scottw...@freescale.com wrote on 2010/12/08 18:18:39:
 
  On Wed, 8 Dec 2010 08:59:49 +0100
  Joakim Tjernlund joakim.tjernl...@transmode.se wrote:
 
   
On Mon, 6 Dec 2010 22:15:54 -0500
Mark Mason ma...@postdiluvian.org wrote:
   
 A few months ago I ran into some performance problems involving
 UBI/NAND erases holding other devices off the LBC on an MPC8315.  I
 found a solution for this, which worked well, at least with the
 hardware I was working with.  I suspect the same problem affects other
 PPCs, probably including multicore devices, and maybe other
 architectures as well.

 I don't have experience with similar NAND controllers on other
 devices, so I'd like to explain what I found and see if someone who's
 more familiar with the family and/or driver can tell if this is
 useful.

 The problem cropped up when there was a lot of traffic to the NAND
 (Samsung K9WAGU08U1B-PIB0), with the NAND being on the LBC along with
 a video chip that needed constant and prompt attention.
   
If you attach NAND to the LBC, you should not attach anything else to
it which is latency-sensitive.
  
   This feature makes the LBC useless to us. Is there some workaround or 
   plan
   to address this limitation?
 
  Complain to your support or sales contact.
 
  I've complained about it in the past, and got a but pins are a limited
  resource! response.  They need to hear that it's a problem from
  customers.
 
 Done, lets see what I get in return. I think this problem will be
 a major obstacle for our next generation boards which will be NAND
 based.

It was a big problem, and a big surprise, for me too.  The next
generation of a couple of the chips on the bus have pcie, but those
are noticably more expensive.

Another problem I ran into was that the DMA performance from a
non-incrementing address was abysmal, PIO turned out to be
significantly faster.  I guess internally the bus does an entire
cacheline transfer for every word read from a fixed address, or
something like that.  I was doing DMA from a device that had only six
address bits, it should have been in the middle of the bus with the
bottom address pins not connected, which would have allowed
incrementing address DMA.  The transfer speed wasn't so much of a
problem, but the longer transfers meant that there was that much less
bus bandwidth for the other devices, so we wound up sacrificing CPU to
get more bus bandwidth.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: MPC831x (and others?) NAND erase performance improvements

2010-12-08 Thread Joakim Tjernlund
Mark Mason ma...@postdiluvian.org wrote on 2010/12/08 20:26:16:

 Joakim Tjernlund joakim.tjernl...@transmode.se wrote:

  Scott Wood scottw...@freescale.com wrote on 2010/12/08 18:18:39:
  
   On Wed, 8 Dec 2010 08:59:49 +0100
   Joakim Tjernlund joakim.tjernl...@transmode.se wrote:
  

 On Mon, 6 Dec 2010 22:15:54 -0500
 Mark Mason ma...@postdiluvian.org wrote:

  A few months ago I ran into some performance problems involving
  UBI/NAND erases holding other devices off the LBC on an MPC8315.  I
  found a solution for this, which worked well, at least with the
  hardware I was working with.  I suspect the same problem affects 
  other
  PPCs, probably including multicore devices, and maybe other
  architectures as well.
 
  I don't have experience with similar NAND controllers on other
  devices, so I'd like to explain what I found and see if someone 
  who's
  more familiar with the family and/or driver can tell if this is
  useful.
 
  The problem cropped up when there was a lot of traffic to the NAND
  (Samsung K9WAGU08U1B-PIB0), with the NAND being on the LBC along 
  with
  a video chip that needed constant and prompt attention.

 If you attach NAND to the LBC, you should not attach anything else to
 it which is latency-sensitive.
   
This feature makes the LBC useless to us. Is there some workaround or 
plan
to address this limitation?
  
   Complain to your support or sales contact.
  
   I've complained about it in the past, and got a but pins are a limited
   resource! response.  They need to hear that it's a problem from
   customers.
 
  Done, lets see what I get in return. I think this problem will be
  a major obstacle for our next generation boards which will be NAND
  based.

 It was a big problem, and a big surprise, for me too.  The next
 generation of a couple of the chips on the bus have pcie, but those
 are noticably more expensive.

Can you think of any workaround such as not connecting the BUSY pin at all?

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: MPC831x (and others?) NAND erase performance improvements

2010-12-08 Thread Scott Wood
On Wed, 8 Dec 2010 20:57:03 +0100
Joakim Tjernlund joakim.tjernl...@transmode.se wrote:

 Mark Mason ma...@postdiluvian.org wrote on 2010/12/08 20:26:16:
 
  Joakim Tjernlund joakim.tjernl...@transmode.se wrote:
 
   Scott Wood scottw...@freescale.com wrote on 2010/12/08 18:18:39:
   
On Wed, 8 Dec 2010 08:59:49 +0100
Joakim Tjernlund joakim.tjernl...@transmode.se wrote:
   
  If you attach NAND to the LBC, you should not attach anything else 
  to
  it which is latency-sensitive.

 This feature makes the LBC useless to us. Is there some workaround 
 or plan
 to address this limitation?
   
Complain to your support or sales contact.
   
I've complained about it in the past, and got a but pins are a limited
resource! response.  They need to hear that it's a problem from
customers.
  
   Done, lets see what I get in return. I think this problem will be
   a major obstacle for our next generation boards which will be NAND
   based.
 
  It was a big problem, and a big surprise, for me too.  The next
  generation of a couple of the chips on the bus have pcie, but those
  are noticably more expensive.
 
 Can you think of any workaround such as not connecting the BUSY pin at all?

Maybe connect the busy pin to a gpio?

-Scott

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: MPC831x (and others?) NAND erase performance improvements

2010-12-08 Thread Joakim Tjernlund
Scott Wood scottw...@freescale.com wrote on 2010/12/08 20:59:28:

 On Wed, 8 Dec 2010 20:57:03 +0100
 Joakim Tjernlund joakim.tjernl...@transmode.se wrote:

  Mark Mason ma...@postdiluvian.org wrote on 2010/12/08 20:26:16:
  
   Joakim Tjernlund joakim.tjernl...@transmode.se wrote:
  
Scott Wood scottw...@freescale.com wrote on 2010/12/08 18:18:39:

 On Wed, 8 Dec 2010 08:59:49 +0100
 Joakim Tjernlund joakim.tjernl...@transmode.se wrote:

   If you attach NAND to the LBC, you should not attach anything 
   else to
   it which is latency-sensitive.
 
  This feature makes the LBC useless to us. Is there some 
  workaround or plan
  to address this limitation?

 Complain to your support or sales contact.

 I've complained about it in the past, and got a but pins are a 
 limited
 resource! response.  They need to hear that it's a problem from
 customers.
   
Done, lets see what I get in return. I think this problem will be
a major obstacle for our next generation boards which will be NAND
based.
  
   It was a big problem, and a big surprise, for me too.  The next
   generation of a couple of the chips on the bus have pcie, but those
   are noticably more expensive.
 
  Can you think of any workaround such as not connecting the BUSY pin at all?

 Maybe connect the busy pin to a gpio?

Is BUSY required for sane operation or it an optimization?
Is there any risk that the NAND device will drive the LB and corrupt
the bus for other devices?

I can't tell, haven't studied NAND in detail yet.

 Jocke

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: MPC831x (and others?) NAND erase performance improvements

2010-12-08 Thread Scott Wood
On Wed, 8 Dec 2010 21:11:08 +0100
Joakim Tjernlund joakim.tjernl...@transmode.se wrote:

 Scott Wood scottw...@freescale.com wrote on 2010/12/08 20:59:28:
 
  On Wed, 8 Dec 2010 20:57:03 +0100
  Joakim Tjernlund joakim.tjernl...@transmode.se wrote:
 
   Can you think of any workaround such as not connecting the BUSY pin at 
   all?
 
  Maybe connect the busy pin to a gpio?
 
 Is BUSY required for sane operation or it an optimization?

You could probably get away without it by inserting delays if you know
the chip specs well enough.

 Is there any risk that the NAND device will drive the LB and corrupt
 the bus for other devices?

I think the only thing the NAND chip should be driving is the busy pin,
until nCE and nRE are lowered.

-Scott

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: MPC831x (and others?) NAND erase performance improvements

2010-12-08 Thread Joakim Tjernlund
Scott Wood scottw...@freescale.com wrote on 2010/12/08 21:25:51:

 On Wed, 8 Dec 2010 21:11:08 +0100
 Joakim Tjernlund joakim.tjernl...@transmode.se wrote:

  Scott Wood scottw...@freescale.com wrote on 2010/12/08 20:59:28:
  
   On Wed, 8 Dec 2010 20:57:03 +0100
   Joakim Tjernlund joakim.tjernl...@transmode.se wrote:
  
Can you think of any workaround such as not connecting the BUSY pin at 
all?
  
   Maybe connect the busy pin to a gpio?
 
  Is BUSY required for sane operation or it an optimization?

 You could probably get away without it by inserting delays if you know
 the chip specs well enough.

Urgh, that does not feel like a good solution. One would have add
big margins to the delays mking the NAND much slower.


  Is there any risk that the NAND device will drive the LB and corrupt
  the bus for other devices?

 I think the only thing the NAND chip should be driving is the busy pin,

OK, good. What function is actually lost if one uses an GPIO instead of
BUSY?
You think Freescale could test and validate a GPIO solution? I don't
think we will be very happy to design our board around an unproven
workaround.

An even better workaround would be if one could add logic between the
NAND and the CPU which would compensate for this defect without needing
special SW fixes.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: MPC831x (and others?) NAND erase performance improvements

2010-12-08 Thread Mark Mason
Joakim Tjernlund joakim.tjernl...@transmode.se wrote:

   Is there any risk that the NAND device will drive the LB and
   corrupt the bus for other devices?
 
  I think the only thing the NAND chip should be driving is the busy pin,
 
 OK, good. What function is actually lost if one uses an GPIO instead
 of BUSY?

I tried to take this route, since it was a fairly minor board change.
Unfortunately all the GPIOs were already used.

I looked long and hard for a way to not have the NAND hold the bus
with BUSY.  If you don't connect BUSY then you can't use the LBC's
flash controller (FCM mode).

I haven't done it personally, but I believe that connecting BUSY to a
GPIO is very common thing to do, since this is the route you'd have to
take if you didn't have a built-in flash controller.  The Linux MTD
layer supports it.

 An even better workaround would be if one could add logic between
 the NAND and the CPU which would compensate for this defect without
 needing special SW fixes.

That probably won't work.  Presumably you're talking about something
like a gate so the BUSY is only passed from the NAND when the NAND's
chip select is asserted.  Unfortunately the NAND controller is
monitoring the BUSY line, and if it sees the signal deassert then it
will think the NAND is done.

I don't think that using a software NAND controller instead of the LBC
FCM mode is all that bad.  Again, I haven't actually done it, so check
the MTD docs, but I'm pretty sure the software is meant to do that, so
it doesn't even really constitute a fix.  Assuming that it is
supported then I doubt that configuring the NAND layer to use your
setup would be any harder than configuring the FCM.  And, U-Boot uses
the Linux MTD code, so you'd get the same support there.

There might also be a way to keep the BUSY and find a workaround with
the other chips on the bus, depending on what they are.  Chances are
that they have a BUSY, but maybe you could move the peripheral's BUSY
to another LBC line and use UPM mode to interpret that line as a BUSY.
Writing UPM programs isn't really all that difficult (well, not the
second time you do it anyway).  That's getting into kludge territory,
though.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: MPC831x (and others?) NAND erase performance improvements

2010-12-08 Thread Scott Wood
On Wed, 8 Dec 2010 22:26:59 +0100
Joakim Tjernlund joakim.tjernl...@transmode.se wrote:

 Scott Wood scottw...@freescale.com wrote on 2010/12/08 21:25:51:
 
  On Wed, 8 Dec 2010 21:11:08 +0100
  Joakim Tjernlund joakim.tjernl...@transmode.se wrote:
 
   Scott Wood scottw...@freescale.com wrote on 2010/12/08 20:59:28:
   
On Wed, 8 Dec 2010 20:57:03 +0100
Joakim Tjernlund joakim.tjernl...@transmode.se wrote:
   
 Can you think of any workaround such as not connecting the BUSY pin 
 at all?
   
Maybe connect the busy pin to a gpio?
  
   Is BUSY required for sane operation or it an optimization?
 
  You could probably get away without it by inserting delays if you know
  the chip specs well enough.
 
 Urgh, that does not feel like a good solution.

No, but you asked if it could be done, and if it was just a
performance issue. :-)

   Is there any risk that the NAND device will drive the LB and corrupt
   the bus for other devices?
 
  I think the only thing the NAND chip should be driving is the busy pin,
 
 OK, good. What function is actually lost if one uses an GPIO instead of
 BUSY?

Not much, if you enable interrupts on the GPIO pin.  The driver would
have to be reworked a bit, of course.

 You think Freescale could test and validate a GPIO solution? I don't
 think we will be very happy to design our board around an unproven
 workaround.

Ask your sales/support contacts.

 An even better workaround would be if one could add logic between the
 NAND and the CPU which would compensate for this defect without needing
 special SW fixes.

The problem with that is when would you assert the chipselect again to
check if it's done?  Current SW depends on being able to tell the LBC
to interrupt (or take other action) when busy goes away.

I suppose you could poll with status reads, which could at least be
preempted if you've got something higher priority to do with the LBC.

-Scott

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: MPC831x (and others?) NAND erase performance improvements

2010-12-08 Thread Scott Wood
On Wed, 8 Dec 2010 17:02:45 -0500
Mark Mason ma...@postdiluvian.org wrote:

 I don't think that using a software NAND controller instead of the LBC
 FCM mode is all that bad.  Again, I haven't actually done it, so check
 the MTD docs, but I'm pretty sure the software is meant to do that, so
 it doesn't even really constitute a fix.  Assuming that it is
 supported then I doubt that configuring the NAND layer to use your
 setup would be any harder than configuring the FCM.

The MTD layer supports some really simple NAND controllers, but what do
you mean by not having a controller at all?  Hooking everything up to
GPIO?  Using UPM?

There is already a UPM NAND driver, BTW.

You would lose hardware ECC and the ability to be interrupt-driven (the
latter should be possible with SW changes, using GPIO interrupts).

-Scott

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


RE: MPC831x (and others?) NAND erase performance improvements

2010-12-07 Thread David Laight
 
 The problem cropped up when there was a lot of traffic to the NAND
 (Samsung K9WAGU08U1B-PIB0), with the NAND being on the LBC along with
 a video chip that needed constant and prompt attention.
 
 What I would see is that, as the writes happened, the erases would
 wind up batched and issued all at once, such that frequently 400-700
 erases were issued in rapid succession with a 1ms LBC BUSY cycle per
 erase.

Are those just the reads of the status register polling to
determine when the sector erase has completed ?
In which case a software delay beteen the reads might work.

Writes probably also have to be polled, but the individual
writes happen faster.

It is possible that an uncached read of another memory area
will stall the cpu long enough to allow another LBC master in.
One every few writes might be enough.
I had to do something similar on rather different hardware ...

David


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: MPC831x (and others?) NAND erase performance improvements

2010-12-07 Thread Mark Mason
David Laight david.lai...@aculab.com wrote:

  The problem cropped up when there was a lot of traffic to the NAND
  (Samsung K9WAGU08U1B-PIB0), with the NAND being on the LBC along with
  a video chip that needed constant and prompt attention.
  
  What I would see is that, as the writes happened, the erases would
  wind up batched and issued all at once, such that frequently 400-700
  erases were issued in rapid succession with a 1ms LBC BUSY cycle per
  erase.
 
 Are those just the reads of the status register polling to
 determine when the sector erase has completed ?

No, it's not, since it isn't polling the status register.  It's using
a hardware line from the NAND to indicate that the NAND is busy.  That
one hardware line is shared between all devices on the bus, so if one
device says it's busy then all bus traffic stops until the NAND
deasserts the busy line.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: MPC831x (and others?) NAND erase performance improvements

2010-12-07 Thread Scott Wood
On Mon, 6 Dec 2010 22:15:54 -0500
Mark Mason ma...@postdiluvian.org wrote:

 A few months ago I ran into some performance problems involving
 UBI/NAND erases holding other devices off the LBC on an MPC8315.  I
 found a solution for this, which worked well, at least with the
 hardware I was working with.  I suspect the same problem affects other
 PPCs, probably including multicore devices, and maybe other
 architectures as well.
 
 I don't have experience with similar NAND controllers on other
 devices, so I'd like to explain what I found and see if someone who's
 more familiar with the family and/or driver can tell if this is
 useful.
 
 The problem cropped up when there was a lot of traffic to the NAND
 (Samsung K9WAGU08U1B-PIB0), with the NAND being on the LBC along with
 a video chip that needed constant and prompt attention.

If you attach NAND to the LBC, you should not attach anything else to
it which is latency-sensitive.

 What I found, though, was that the NAND did not inherently assert BUSY
 as part of the erase - BUSY was asserted because the driver polled for
 the status (NAND_CMD_STATUS).  If the status poll was delayed for the
 duration of the erase then the MPC could talk to the video chip while
 the erase was in progress.  At the end of the 1ms delay I would then
 poll for status, which would complete effectively immediately.

That's what we originially did.  The problem is that during this
interval the NAND chip will be driving the busy pin, which corrupts
other LBC transactions.

Newer chips have this added text in their reference manuals under NAND
Flash Block Erase Command Sequence Example:

  Note that operations specified by OP3 and OP4 (status read) should
  never be skipped while erasing a NAND Flash device, because, in case
  that happens, contention may arise on LGPL4.  A possible case is that
  the next transaction from eLBC may try to use that pin as an output
  and since the NAND Flash device might already be driving it,
  contention will occur.  In case OP3 and OP4 operations are skipped,
  it may also happen that a new command is issued to the NAND Flash
  device even when the device has not yet finished processing the
  previous request.  This may also result in unpredictable behavior.


 Here's a code snippet from 2.6.37, with some comments I added.
 drivers/mtd/nand/fsl_elbc_nand.c - fsl_elbc_cmdfunc():
 
   /* ERASE2 uses the block and page address from ERASE1 */
   case NAND_CMD_ERASE2:
 dev_vdbg(priv-dev, fsl_elbc_cmdfunc: NAND_CMD_ERASE2.\n);
 
 out_be32(lbc-fir,
(FIR_OP_CM0  FIR_OP0_SHIFT) |  /* Execute CMD0 (ERASE1).   */
(FIR_OP_PA   FIR_OP1_SHIFT) |  /* Issue block and page address.*/
(FIR_OP_CM2  FIR_OP2_SHIFT) |  /* Execute CMD2 (ERASE2).   */
/* (delay needed here - this is where the erase happens) */
(FIR_OP_CW1  FIR_OP3_SHIFT) |  /* Wait for LFRB (BUSY) to deassert */
 /* then issue CW1 (read status).*/
(FIR_OP_RS   FIR_OP4_SHIFT));  /* Read one byte.   */
 
 out_be32(lbc-fcr,
(NAND_CMD_ERASE1  FCR_CMD0_SHIFT) |  /* 0x60 */
(NAND_CMD_STATUS  FCR_CMD1_SHIFT) |  /* 0x70 */
(NAND_CMD_ERASE2  FCR_CMD2_SHIFT));  /* 0xD0 */
 
 out_be32(lbc-fbcr, 0);
 elbc_fcm_ctrl-read_bytes = 0;
 elbc_fcm_ctrl-use_mdr = 1;
 
 fsl_elbc_run_command(mtd);
 return;
 
 What I did was to issue two commands with fsl_elbc_run_command(), with
 a 1ms sleep in between (a tightloop delay worked almost as well, the
 important part was having 1ms between the erase and the status poll).
 The first command did the FIR_OP_CM0 (NAND_CMD_ERASE1), FIR_OP_PA, and
 FIR_OP_CM2 (NAND_CMD_ERASE2).  The second did the FIR_OP_CW1
 (NAND_CMD_STATUS) and FIR_OP_RS.

So essentially, you reverted commit
476459a6cf46d20ec73d9b211f3894ced5f9871e

:-)

Except for the 1ms delay.

 I know almost nothing at all about the scheduler, but I'm pretty sure
 that this behavior would cause the scheduler to think the video thread
 was a CPU hog, since the video thread was running for 1ms for every
 20us that the UBI BGT ran, which would cause the scheduler to unfairly
 prefer the UBI BGT.  I initially tried to address this problem with
 thread priorities, but the unfortunate reality was that either the
 NAND writes could fall behind or the video chip could fall behind, and
 there wasn't spare bandwidth to allow either.

If you set a realtime priority and have preemption enabled, you should
be able to avoid being delayed by more than one NAND transaction, until
the realtime thread sleeps.  Be careful to ensure that it does sleep
enough for other things to run.

-Scott

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: MPC831x (and others?) NAND erase performance improvements

2010-12-07 Thread Mark Mason
Scott Wood scottw...@freescale.com wrote:

 On Mon, 6 Dec 2010 22:15:54 -0500
 Mark Mason ma...@postdiluvian.org wrote:
 
  A few months ago I ran into some performance problems involving
  UBI/NAND erases holding other devices off the LBC on an MPC8315.  I
  found a solution for this, which worked well, at least with the
  hardware I was working with.  I suspect the same problem affects other
  PPCs, probably including multicore devices, and maybe other
  architectures as well.
  
  I don't have experience with similar NAND controllers on other
  devices, so I'd like to explain what I found and see if someone who's
  more familiar with the family and/or driver can tell if this is
  useful.
  
  The problem cropped up when there was a lot of traffic to the NAND
  (Samsung K9WAGU08U1B-PIB0), with the NAND being on the LBC along with
  a video chip that needed constant and prompt attention.
 
 If you attach NAND to the LBC, you should not attach anything else
 to it which is latency-sensitive.

We found that out the hard way.

The 1ms latency wasn't a problem by itself, the real problem was that
the quantity of erases issued in a short time significantly decreased
the bandwidth available, and that the scheduler saw the video thread
use 1ms of CPU time even though it'd only done a couple hundred
nanoseconds worth of work.

  What I found, though, was that the NAND did not inherently assert BUSY
  as part of the erase - BUSY was asserted because the driver polled for
  the status (NAND_CMD_STATUS).  If the status poll was delayed for the
  duration of the erase then the MPC could talk to the video chip while
  the erase was in progress.  At the end of the 1ms delay I would then
  poll for status, which would complete effectively immediately.
 
 That's what we originially did.  The problem is that during this
 interval the NAND chip will be driving the busy pin, which corrupts
 other LBC transactions.

This is not what we observed with our flash part.  For a page erase,
the NAND did not assert the busy pin until the status read was done.
This was confirmed with a logic analyzer, and taking advantage of this
behavior is the sole purpose of the change.

I don't think that this behavior is what's described in the Samsung
datasheet, but it is what our parts did.

I incorrectly said polled for status in my original post.  It did
not poll for status, it monitored the busy line from NAND and did a
single read from the status register.

 Newer chips have this added text in their reference manuals under
 NAND Flash Block Erase Command Sequence Example:
 
   Note that operations specified by OP3 and OP4 (status read) should
   never be skipped while erasing a NAND Flash device, because, in
   case that happens, contention may arise on LGPL4.  A possible case
   is that the next transaction from eLBC may try to use that pin as
   an output and since the NAND Flash device might already be driving
   it, contention will occur.  In case OP3 and OP4 operations are
   skipped, it may also happen that a new command is issued to the
   NAND Flash device even when the device has not yet finished
   processing the previous request.  This may also result in
   unpredictable behavior.

I would expect those operations to be mandatory.

  Here's a code snippet from 2.6.37, with some comments I added.
  drivers/mtd/nand/fsl_elbc_nand.c - fsl_elbc_cmdfunc():
  
/* ERASE2 uses the block and page address from ERASE1 */
case NAND_CMD_ERASE2:
  dev_vdbg(priv-dev, fsl_elbc_cmdfunc: NAND_CMD_ERASE2.\n);
  
  out_be32(lbc-fir,
 (FIR_OP_CM0  FIR_OP0_SHIFT) |  /* Execute CMD0 (ERASE1).   
  */
 (FIR_OP_PA   FIR_OP1_SHIFT) |  /* Issue block and page address.
  */
 (FIR_OP_CM2  FIR_OP2_SHIFT) |  /* Execute CMD2 (ERASE2).   
  */
 /* (delay needed here - this is where the erase happens) */
 (FIR_OP_CW1  FIR_OP3_SHIFT) |  /* Wait for LFRB (BUSY) to deassert 
  */
  /* then issue CW1 (read status).
  */
 (FIR_OP_RS   FIR_OP4_SHIFT));  /* Read one byte.   
  */
  
  out_be32(lbc-fcr,
 (NAND_CMD_ERASE1  FCR_CMD0_SHIFT) |  /* 0x60 */
 (NAND_CMD_STATUS  FCR_CMD1_SHIFT) |  /* 0x70 */
 (NAND_CMD_ERASE2  FCR_CMD2_SHIFT));  /* 0xD0 */
  
  out_be32(lbc-fbcr, 0);
  elbc_fcm_ctrl-read_bytes = 0;
  elbc_fcm_ctrl-use_mdr = 1;
  
  fsl_elbc_run_command(mtd);
  return;
  
  What I did was to issue two commands with fsl_elbc_run_command(), with
  a 1ms sleep in between (a tightloop delay worked almost as well, the
  important part was having 1ms between the erase and the status poll).
  The first command did the FIR_OP_CM0 (NAND_CMD_ERASE1), FIR_OP_PA, and
  FIR_OP_CM2 (NAND_CMD_ERASE2).  The second did the FIR_OP_CW1
  (NAND_CMD_STATUS) and FIR_OP_RS.
 
 So essentially, you reverted commit
 476459a6cf46d20ec73d9b211f3894ced5f9871e
 
 :-)
 
 Except for the 1ms delay.


Re: MPC831x (and others?) NAND erase performance improvements

2010-12-07 Thread Scott Wood
On Tue, 7 Dec 2010 18:24:45 -0500
Mark Mason ma...@postdiluvian.org wrote:

 Scott Wood scottw...@freescale.com wrote:
 
  On Mon, 6 Dec 2010 22:15:54 -0500
  Mark Mason ma...@postdiluvian.org wrote:
 
   What I found, though, was that the NAND did not inherently assert BUSY
   as part of the erase - BUSY was asserted because the driver polled for
   the status (NAND_CMD_STATUS).  If the status poll was delayed for the
   duration of the erase then the MPC could talk to the video chip while
   the erase was in progress.  At the end of the 1ms delay I would then
   poll for status, which would complete effectively immediately.
  
  That's what we originially did.  The problem is that during this
  interval the NAND chip will be driving the busy pin, which corrupts
  other LBC transactions.
 
 This is not what we observed with our flash part.  For a page erase,
 the NAND did not assert the busy pin until the status read was done.
 This was confirmed with a logic analyzer, and taking advantage of this
 behavior is the sole purpose of the change.

How would that work, in the normal case where you wait for busy to go
away before reading status?

We observed this corruption happening.  It was the motivation for
commit 476459a6cf46d20ec73d9b211f3894ced5f9871e.

  Newer chips have this added text in their reference manuals under
  NAND Flash Block Erase Command Sequence Example:
  
Note that operations specified by OP3 and OP4 (status read) should
never be skipped while erasing a NAND Flash device, because, in
case that happens, contention may arise on LGPL4.  A possible case
is that the next transaction from eLBC may try to use that pin as
an output and since the NAND Flash device might already be driving
it, contention will occur.  In case OP3 and OP4 operations are
skipped, it may also happen that a new command is issued to the
NAND Flash device even when the device has not yet finished
processing the previous request.  This may also result in
unpredictable behavior.
 
 I would expect those operations to be mandatory.

...but you remove them from the original FIR.  They're not just
mandatory to be done eventually, it has to be done within the one
transaction that is atomic at the LBC.

   I know almost nothing at all about the scheduler, but I'm pretty
   sure that this behavior would cause the scheduler to think the
   video thread was a CPU hog, since the video thread was running for
   1ms for every 20us that the UBI BGT ran, which would cause the
   scheduler to unfairly prefer the UBI BGT.  I initially tried to
   address this problem with thread priorities, but the unfortunate
   reality was that either the NAND writes could fall behind or the
   video chip could fall behind, and there wasn't spare bandwidth to
   allow either.
  
  If you set a realtime priority and have preemption enabled, you
  should be able to avoid being delayed by more than one NAND
  transaction, until the realtime thread sleeps.  Be careful to ensure
  that it does sleep enough for other things to run.
 
 I tried that, but if the erases were held off enough to get the other
 bus bandwidth we required then the NAND writes fell behind and the
 kernel oom'd.


Another possibility (but still hackish) is to have the NAND driver
poll rather than be interrupt driven, and disable interrupts while
polling.  Then, whenever anything else runs (assuming no SMP), it
should be safe to access LBC without high latency -- so the scheduler
shouldn't get confused.

To make it somewhat cleaner, provide the same benefit on SMP, and allow
non-LBC things to run, you could use a mutex to synchronize between all
LBC users, though that's more work and could be a nuisance if you want
to access the LBC from an interrupt handler.

-Scott

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


MPC831x (and others?) NAND erase performance improvements

2010-12-06 Thread Mark Mason
A few months ago I ran into some performance problems involving
UBI/NAND erases holding other devices off the LBC on an MPC8315.  I
found a solution for this, which worked well, at least with the
hardware I was working with.  I suspect the same problem affects other
PPCs, probably including multicore devices, and maybe other
architectures as well.

I don't have experience with similar NAND controllers on other
devices, so I'd like to explain what I found and see if someone who's
more familiar with the family and/or driver can tell if this is
useful.

The problem cropped up when there was a lot of traffic to the NAND
(Samsung K9WAGU08U1B-PIB0), with the NAND being on the LBC along with
a video chip that needed constant and prompt attention.

What I would see is that, as the writes happened, the erases would
wind up batched and issued all at once, such that frequently 400-700
erases were issued in rapid succession with a 1ms LBC BUSY cycle per
erase.  BUSY was shared with all of the devices on the LBC, so the PPC
could not talk to the video chip as long as BUSY was asserted by the
NAND.  This would give us a window of up to 700ms in which the PPC
could manage very little communication with other devices on the LBC -
in our case the video chip, for which this delay was essentially
fatal.  I suspect that some multicore chips might have one core
effectively halt if that core attempts to access the LBC while the
other core (or itself, for that matter) is executing an erase (if they
have a similar NAND controller).

What I found, though, was that the NAND did not inherently assert BUSY
as part of the erase - BUSY was asserted because the driver polled for
the status (NAND_CMD_STATUS).  If the status poll was delayed for the
duration of the erase then the MPC could talk to the video chip while
the erase was in progress.  At the end of the 1ms delay I would then
poll for status, which would complete effectively immediately.

Here's a code snippet from 2.6.37, with some comments I added.
drivers/mtd/nand/fsl_elbc_nand.c - fsl_elbc_cmdfunc():

  /* ERASE2 uses the block and page address from ERASE1 */
  case NAND_CMD_ERASE2:
dev_vdbg(priv-dev, fsl_elbc_cmdfunc: NAND_CMD_ERASE2.\n);

out_be32(lbc-fir,
   (FIR_OP_CM0  FIR_OP0_SHIFT) |  /* Execute CMD0 (ERASE1).   */
   (FIR_OP_PA   FIR_OP1_SHIFT) |  /* Issue block and page address.*/
   (FIR_OP_CM2  FIR_OP2_SHIFT) |  /* Execute CMD2 (ERASE2).   */
   /* (delay needed here - this is where the erase happens) */
   (FIR_OP_CW1  FIR_OP3_SHIFT) |  /* Wait for LFRB (BUSY) to deassert */
/* then issue CW1 (read status).*/
   (FIR_OP_RS   FIR_OP4_SHIFT));  /* Read one byte.   */

out_be32(lbc-fcr,
   (NAND_CMD_ERASE1  FCR_CMD0_SHIFT) |  /* 0x60 */
   (NAND_CMD_STATUS  FCR_CMD1_SHIFT) |  /* 0x70 */
   (NAND_CMD_ERASE2  FCR_CMD2_SHIFT));  /* 0xD0 */

out_be32(lbc-fbcr, 0);
elbc_fcm_ctrl-read_bytes = 0;
elbc_fcm_ctrl-use_mdr = 1;

fsl_elbc_run_command(mtd);
return;

What I did was to issue two commands with fsl_elbc_run_command(), with
a 1ms sleep in between (a tightloop delay worked almost as well, the
important part was having 1ms between the erase and the status poll).
The first command did the FIR_OP_CM0 (NAND_CMD_ERASE1), FIR_OP_PA, and
FIR_OP_CM2 (NAND_CMD_ERASE2).  The second did the FIR_OP_CW1
(NAND_CMD_STATUS) and FIR_OP_RS.

For a bit more detail...  fsl_elbc_run_command() would put the thread
issuing the erase to sleep so other threads could run.  That did work
as planned, except that I was working with a fairly pathalogical case
- there was a very high volume of writes to the NAND, and the video
chip required very frequent and prompt attention.  This meant that the
thread that was most likely to run when the NAND erase was in progress
was the thread that serviced the video chip.

A logic analyzer backed this up.  It would show the erase being
issued, BUSY (R/B# or LFRB) being asserted for 1ms, one or two 16 bit
transactions to the video chip, then another erase, repeating this
process hundreds of times in a row.  The UBI BGT would run long enough
to issue an erase (probably on the order of 20us) then go to sleep.
The video thread would then run, and issue a transaction to the chip.
That transaction would get blocked until BUSY deasserted, at which
point the thread would appear to have run for 1ms, even though it had
only executed a single bus transaction.

I know almost nothing at all about the scheduler, but I'm pretty sure
that this behavior would cause the scheduler to think the video thread
was a CPU hog, since the video thread was running for 1ms for every
20us that the UBI BGT ran, which would cause the scheduler to unfairly
prefer the UBI BGT.  I initially tried to address this problem with
thread priorities, but the unfortunate reality was that either the
NAND writes could fall behind or the video chip could