RE: MPC831x (and others?) NAND erase performance improvements
An external IRQ line would let you limit interrupts to rising edges rather than all edges, though you'd lose the ability to directly read the line status. oh, one cannot read the IRQ line? didn't know that. Also I not sure all Freescale CPUs can do rising edge. I suspect that you may be able to leave the interupt masked, but still read the 'interrupt pending' register. Which would have the same effect. Our HW engineers tend to feed everything into an FPGA since it gives than a lot more flexibility over pin connections. In which case the invertor is trivial. (and the fpga interface can read the status!) David ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
RE: MPC831x (and others?) NAND erase performance improvements
David Laight david.lai...@aculab.com wrote on 2010/12/13 09:33:37: An external IRQ line would let you limit interrupts to rising edges rather than all edges, though you'd lose the ability to directly read the line status. oh, one cannot read the IRQ line? didn't know that. Also I not sure all Freescale CPUs can do rising edge. I suspect that you may be able to leave the interupt masked, but still read the 'interrupt pending' register. Which would have the same effect. Ah, that should work too. I should be able to read the 'interrupt pending' register at all times, even when it isn't masked. What if one has several NAND chips to build a big FS? Is the NAND controller equipped to handle that? Our HW engineers tend to feed everything into an FPGA since it gives than a lot more flexibility over pin connections. In which case the invertor is trivial. (and the fpga interface can read the status!) Yes, but not all of our boards have FPGA and we load the FPGA from the SW so it is a chicken and egg problem for us. Jocke ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: MPC831x (and others?) NAND erase performance improvements
On Mon, 13 Dec 2010 11:32:00 +0100 Joakim Tjernlund joakim.tjernl...@transmode.se wrote: David Laight david.lai...@aculab.com wrote on 2010/12/13 09:33:37: An external IRQ line would let you limit interrupts to rising edges rather than all edges, though you'd lose the ability to directly read the line status. oh, one cannot read the IRQ line? didn't know that. Also I not sure all Freescale CPUs can do rising edge. Ah right, 83xx has IPIC rather than MPIC. I suspect that you may be able to leave the interupt masked, but still read the 'interrupt pending' register. Which would have the same effect. Ah, that should work too. I should be able to read the 'interrupt pending' register at all times, even when it isn't masked. This could work OK if you have board logic to invert the signal. What if one has several NAND chips to build a big FS? Is the NAND controller equipped to handle that? FCM can drive one NAND chip per eLBC chipselect, though possibly you could go beyond that with a board-logic chipselect mechanism. -Scott ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: MPC831x (and others?) NAND erase performance improvements
Scott Wood scottw...@freescale.com wrote on 2010/12/13 18:33:56: On Mon, 13 Dec 2010 11:32:00 +0100 Joakim Tjernlund joakim.tjernl...@transmode.se wrote: David Laight david.lai...@aculab.com wrote on 2010/12/13 09:33:37: An external IRQ line would let you limit interrupts to rising edges rather than all edges, though you'd lose the ability to directly read the line status. oh, one cannot read the IRQ line? didn't know that. Also I not sure all Freescale CPUs can do rising edge. Ah right, 83xx has IPIC rather than MPIC. I suspect that you may be able to leave the interupt masked, but still read the 'interrupt pending' register. Which would have the same effect. Ah, that should work too. I should be able to read the 'interrupt pending' register at all times, even when it isn't masked. This could work OK if you have board logic to invert the signal. yeah, just a NAND gate :) What if one has several NAND chips to build a big FS? Is the NAND controller equipped to handle that? FCM can drive one NAND chip per eLBC chipselect, though possibly you could go beyond that with a board-logic chipselect mechanism. hmm, then I guess one would have to use one GPIO/IRQ per NAND chip? ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: MPC831x (and others?) NAND erase performance improvements
On Mon, 13 Dec 2010 18:41:32 +0100 Joakim Tjernlund joakim.tjernl...@transmode.se wrote: Scott Wood scottw...@freescale.com wrote on 2010/12/13 18:33:56: On Mon, 13 Dec 2010 11:32:00 +0100 Joakim Tjernlund joakim.tjernl...@transmode.se wrote: What if one has several NAND chips to build a big FS? Is the NAND controller equipped to handle that? FCM can drive one NAND chip per eLBC chipselect, though possibly you could go beyond that with a board-logic chipselect mechanism. hmm, then I guess one would have to use one GPIO/IRQ per NAND chip? Couldn't you just tie together all the open-drain busy lines before you invert it? You'll only be driving one NAND chip at a time anyway; the others should not be asserting busy. -Scott ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: MPC831x (and others?) NAND erase performance improvements
Scott Wood scottw...@freescale.com wrote on 2010/12/13 18:51:31: On Mon, 13 Dec 2010 18:41:32 +0100 Joakim Tjernlund joakim.tjernl...@transmode.se wrote: Scott Wood scottw...@freescale.com wrote on 2010/12/13 18:33:56: On Mon, 13 Dec 2010 11:32:00 +0100 Joakim Tjernlund joakim.tjernl...@transmode.se wrote: What if one has several NAND chips to build a big FS? Is the NAND controller equipped to handle that? FCM can drive one NAND chip per eLBC chipselect, though possibly you could go beyond that with a board-logic chipselect mechanism. hmm, then I guess one would have to use one GPIO/IRQ per NAND chip? Couldn't you just tie together all the open-drain busy lines before you invert it? You'll only be driving one NAND chip at a time anyway; the others should not be asserting busy. hmm, I guess that would work(didn't know they were open-drain), thanks. Is that how the FCM do it? Jocke ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: MPC831x (and others?) NAND erase performance improvements
On Mon, 13 Dec 2010 20:30:27 +0100 Joakim Tjernlund joakim.tjernl...@transmode.se wrote: Scott Wood scottw...@freescale.com wrote on 2010/12/13 18:51:31: On Mon, 13 Dec 2010 18:41:32 +0100 Joakim Tjernlund joakim.tjernl...@transmode.se wrote: Scott Wood scottw...@freescale.com wrote on 2010/12/13 18:33:56: On Mon, 13 Dec 2010 11:32:00 +0100 Joakim Tjernlund joakim.tjernl...@transmode.se wrote: What if one has several NAND chips to build a big FS? Is the NAND controller equipped to handle that? FCM can drive one NAND chip per eLBC chipselect, though possibly you could go beyond that with a board-logic chipselect mechanism. hmm, then I guess one would have to use one GPIO/IRQ per NAND chip? Couldn't you just tie together all the open-drain busy lines before you invert it? You'll only be driving one NAND chip at a time anyway; the others should not be asserting busy. hmm, I guess that would work(didn't know they were open-drain), thanks. Is that how the FCM do it? Yes, that's what started this discussion. :-) The problem there is that they share the line with all chipselects, NAND or otherwise. -Scott ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: MPC831x (and others?) NAND erase performance improvements
Scott Wood scottw...@freescale.com wrote on 2010/12/13 20:49:50: On Mon, 13 Dec 2010 20:30:27 +0100 Joakim Tjernlund joakim.tjernl...@transmode.se wrote: Scott Wood scottw...@freescale.com wrote on 2010/12/13 18:51:31: On Mon, 13 Dec 2010 18:41:32 +0100 Joakim Tjernlund joakim.tjernl...@transmode.se wrote: Scott Wood scottw...@freescale.com wrote on 2010/12/13 18:33:56: On Mon, 13 Dec 2010 11:32:00 +0100 Joakim Tjernlund joakim.tjernl...@transmode.se wrote: What if one has several NAND chips to build a big FS? Is the NAND controller equipped to handle that? FCM can drive one NAND chip per eLBC chipselect, though possibly you could go beyond that with a board-logic chipselect mechanism. hmm, then I guess one would have to use one GPIO/IRQ per NAND chip? Couldn't you just tie together all the open-drain busy lines before you invert it? You'll only be driving one NAND chip at a time anyway; the others should not be asserting busy. hmm, I guess that would work(didn't know they were open-drain), thanks. Is that how the FCM do it? Yes, that's what started this discussion. :-) True, I must be getting old :) The problem there is that they share the line with all chipselects, NAND or otherwise. Right, thanks for reminding me. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: MPC831x (and others?) NAND erase performance improvements
Scott Wood scottw...@freescale.com wrote on 2010/12/10 18:56:39: On Fri, 10 Dec 2010 13:39:01 +0100 Joakim Tjernlund joakim.tjernl...@transmode.se wrote: Scott Wood scottw...@freescale.com wrote on 2010/12/08 23:25:59: On Wed, 8 Dec 2010 17:02:45 -0500 Mark Mason ma...@postdiluvian.org wrote: I don't think that using a software NAND controller instead of the LBC FCM mode is all that bad. Again, I haven't actually done it, so check the MTD docs, but I'm pretty sure the software is meant to do that, so it doesn't even really constitute a fix. Assuming that it is supported then I doubt that configuring the NAND layer to use your setup would be any harder than configuring the FCM. The MTD layer supports some really simple NAND controllers, but what do you mean by not having a controller at all? Hooking everything up to GPIO? Using UPM? There is already a UPM NAND driver, BTW. You would lose hardware ECC and the ability to be interrupt-driven (the latter should be possible with SW changes, using GPIO interrupts). hmm, you think it would be possible to use one of the IRQ pins instead? GPIO should be fine, software just needs to be changed to use the interrupt functionality. An external IRQ line would let you limit interrupts to rising edges rather than all edges, though you'd lose the ability to directly read the line status. oh, one cannot read the IRQ line? didn't know that. Also I not sure all Freescale CPUs can do rising edge. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: MPC831x (and others?) NAND erase performance improvements
Scott, do you think this issue also applies to MPC8377 ? I'm in the middle of a small redesign for series production and would like not to miss a thing. We have Nand, Nor and MRAM connected to LBC. Since RFS is running from NAND and we use the MRAM as a non-volatile SRAM I'd like to avoid being hit by this issue. Any comments from your side ? Regards, André On Wed, 8 Dec 2010 22:26:59 +0100 Joakim Tjernlundjoakim.tjernl...@transmode.se wrote: Scott Woodscottw...@freescale.com wrote on 2010/12/08 21:25:51: On Wed, 8 Dec 2010 21:11:08 +0100 Joakim Tjernlundjoakim.tjernl...@transmode.se wrote: Scott Woodscottw...@freescale.com wrote on 2010/12/08 20:59:28: On Wed, 8 Dec 2010 20:57:03 +0100 Joakim Tjernlundjoakim.tjernl...@transmode.se wrote: Can you think of any workaround such as not connecting the BUSY pin at all? Maybe connect the busy pin to a gpio? Is BUSY required for sane operation or it an optimization? You could probably get away without it by inserting delays if you know the chip specs well enough. Urgh, that does not feel like a good solution. No, but you asked if it could be done, and if it was just a performance issue. :-) Is there any risk that the NAND device will drive the LB and corrupt the bus for other devices? I think the only thing the NAND chip should be driving is the busy pin, OK, good. What function is actually lost if one uses an GPIO instead of BUSY? Not much, if you enable interrupts on the GPIO pin. The driver would have to be reworked a bit, of course. You think Freescale could test and validate a GPIO solution? I don't think we will be very happy to design our board around an unproven workaround. Ask your sales/support contacts. An even better workaround would be if one could add logic between the NAND and the CPU which would compensate for this defect without needing special SW fixes. The problem with that is when would you assert the chipselect again to check if it's done? Current SW depends on being able to tell the LBC to interrupt (or take other action) when busy goes away. I suppose you could poll with status reads, which could at least be preempted if you've got something higher priority to do with the LBC. -Scott ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev MATRIX VISION GmbH, Talstrasse 16, DE-71570 Oppenweiler Registergericht: Amtsgericht Stuttgart, HRB 271090 Geschaeftsfuehrer: Gerhard Thullner, Werner Armingeon, Uwe Furtner ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: MPC831x (and others?) NAND erase performance improvements
Andre Schwarz andre.schw...@matrix-vision.de wrote on 2010/12/10 09:47:10: Scott, do you think this issue also applies to MPC8377 ? Probably, I think this is so for all eLBC controllers. I'm in the middle of a small redesign for series production and would like not to miss a thing. We have Nand, Nor and MRAM connected to LBC. Since RFS is running from NAND and we use the MRAM as a non-volatile SRAM I'd like to avoid being hit by this issue. Please report back, I really want to know if this works and if there are any drawbacks. Any comments from your side ? Regards, André On Wed, 8 Dec 2010 22:26:59 +0100 Joakim Tjernlundjoakim.tjernl...@transmode.se wrote: Scott Woodscottw...@freescale.com wrote on 2010/12/08 21:25:51: On Wed, 8 Dec 2010 21:11:08 +0100 Joakim Tjernlundjoakim.tjernl...@transmode.se wrote: Scott Woodscottw...@freescale.com wrote on 2010/12/08 20:59:28: On Wed, 8 Dec 2010 20:57:03 +0100 Joakim Tjernlundjoakim.tjernl...@transmode.se wrote: Can you think of any workaround such as not connecting the BUSY pin at all? Maybe connect the busy pin to a gpio? Is BUSY required for sane operation or it an optimization? You could probably get away without it by inserting delays if you know the chip specs well enough. Urgh, that does not feel like a good solution. No, but you asked if it could be done, and if it was just a performance issue. :-) Is there any risk that the NAND device will drive the LB and corrupt the bus for other devices? I think the only thing the NAND chip should be driving is the busy pin, OK, good. What function is actually lost if one uses an GPIO instead of BUSY? Not much, if you enable interrupts on the GPIO pin. The driver would have to be reworked a bit, of course. You think Freescale could test and validate a GPIO solution? I don't think we will be very happy to design our board around an unproven workaround. Ask your sales/support contacts. An even better workaround would be if one could add logic between the NAND and the CPU which would compensate for this defect without needing special SW fixes. The problem with that is when would you assert the chipselect again to check if it's done? Current SW depends on being able to tell the LBC to interrupt (or take other action) when busy goes away. I suppose you could poll with status reads, which could at least be preempted if you've got something higher priority to do with the LBC. -Scott ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev MATRIX VISION GmbH, Talstrasse 16, DE-71570 Oppenweiler Registergericht: Amtsgericht Stuttgart, HRB 271090 Geschaeftsfuehrer: Gerhard Thullner, Werner Armingeon, Uwe Furtner ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: MPC831x (and others?) NAND erase performance improvements
Scott Wood scottw...@freescale.com wrote on 2010/12/08 23:25:59: On Wed, 8 Dec 2010 17:02:45 -0500 Mark Mason ma...@postdiluvian.org wrote: I don't think that using a software NAND controller instead of the LBC FCM mode is all that bad. Again, I haven't actually done it, so check the MTD docs, but I'm pretty sure the software is meant to do that, so it doesn't even really constitute a fix. Assuming that it is supported then I doubt that configuring the NAND layer to use your setup would be any harder than configuring the FCM. The MTD layer supports some really simple NAND controllers, but what do you mean by not having a controller at all? Hooking everything up to GPIO? Using UPM? There is already a UPM NAND driver, BTW. You would lose hardware ECC and the ability to be interrupt-driven (the latter should be possible with SW changes, using GPIO interrupts). hmm, you think it would be possible to use one of the IRQ pins instead? ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: MPC831x (and others?) NAND erase performance improvements
On Fri, 10 Dec 2010 13:39:01 +0100 Joakim Tjernlund joakim.tjernl...@transmode.se wrote: Scott Wood scottw...@freescale.com wrote on 2010/12/08 23:25:59: On Wed, 8 Dec 2010 17:02:45 -0500 Mark Mason ma...@postdiluvian.org wrote: I don't think that using a software NAND controller instead of the LBC FCM mode is all that bad. Again, I haven't actually done it, so check the MTD docs, but I'm pretty sure the software is meant to do that, so it doesn't even really constitute a fix. Assuming that it is supported then I doubt that configuring the NAND layer to use your setup would be any harder than configuring the FCM. The MTD layer supports some really simple NAND controllers, but what do you mean by not having a controller at all? Hooking everything up to GPIO? Using UPM? There is already a UPM NAND driver, BTW. You would lose hardware ECC and the ability to be interrupt-driven (the latter should be possible with SW changes, using GPIO interrupts). hmm, you think it would be possible to use one of the IRQ pins instead? GPIO should be fine, software just needs to be changed to use the interrupt functionality. An external IRQ line would let you limit interrupts to rising edges rather than all edges, though you'd lose the ability to directly read the line status. -Scott ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: MPC831x (and others?) NAND erase performance improvements
On Mon, 6 Dec 2010 22:15:54 -0500 Mark Mason ma...@postdiluvian.org wrote: A few months ago I ran into some performance problems involving UBI/NAND erases holding other devices off the LBC on an MPC8315. I found a solution for this, which worked well, at least with the hardware I was working with. I suspect the same problem affects other PPCs, probably including multicore devices, and maybe other architectures as well. I don't have experience with similar NAND controllers on other devices, so I'd like to explain what I found and see if someone who's more familiar with the family and/or driver can tell if this is useful. The problem cropped up when there was a lot of traffic to the NAND (Samsung K9WAGU08U1B-PIB0), with the NAND being on the LBC along with a video chip that needed constant and prompt attention. If you attach NAND to the LBC, you should not attach anything else to it which is latency-sensitive. This feature makes the LBC useless to us. Is there some workaround or plan to address this limitation? Jocke ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: MPC831x (and others?) NAND erase performance improvements
On Wed, 8 Dec 2010 08:59:49 +0100 Joakim Tjernlund joakim.tjernl...@transmode.se wrote: On Mon, 6 Dec 2010 22:15:54 -0500 Mark Mason ma...@postdiluvian.org wrote: A few months ago I ran into some performance problems involving UBI/NAND erases holding other devices off the LBC on an MPC8315. I found a solution for this, which worked well, at least with the hardware I was working with. I suspect the same problem affects other PPCs, probably including multicore devices, and maybe other architectures as well. I don't have experience with similar NAND controllers on other devices, so I'd like to explain what I found and see if someone who's more familiar with the family and/or driver can tell if this is useful. The problem cropped up when there was a lot of traffic to the NAND (Samsung K9WAGU08U1B-PIB0), with the NAND being on the LBC along with a video chip that needed constant and prompt attention. If you attach NAND to the LBC, you should not attach anything else to it which is latency-sensitive. This feature makes the LBC useless to us. Is there some workaround or plan to address this limitation? Complain to your support or sales contact. I've complained about it in the past, and got a but pins are a limited resource! response. They need to hear that it's a problem from customers. -Scott ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: MPC831x (and others?) NAND erase performance improvements
Scott Wood scottw...@freescale.com wrote on 2010/12/08 18:18:39: On Wed, 8 Dec 2010 08:59:49 +0100 Joakim Tjernlund joakim.tjernl...@transmode.se wrote: On Mon, 6 Dec 2010 22:15:54 -0500 Mark Mason ma...@postdiluvian.org wrote: A few months ago I ran into some performance problems involving UBI/NAND erases holding other devices off the LBC on an MPC8315. I found a solution for this, which worked well, at least with the hardware I was working with. I suspect the same problem affects other PPCs, probably including multicore devices, and maybe other architectures as well. I don't have experience with similar NAND controllers on other devices, so I'd like to explain what I found and see if someone who's more familiar with the family and/or driver can tell if this is useful. The problem cropped up when there was a lot of traffic to the NAND (Samsung K9WAGU08U1B-PIB0), with the NAND being on the LBC along with a video chip that needed constant and prompt attention. If you attach NAND to the LBC, you should not attach anything else to it which is latency-sensitive. This feature makes the LBC useless to us. Is there some workaround or plan to address this limitation? Complain to your support or sales contact. I've complained about it in the past, and got a but pins are a limited resource! response. They need to hear that it's a problem from customers. Done, lets see what I get in return. I think this problem will be a major obstacle for our next generation boards which will be NAND based. Jocke ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: MPC831x (and others?) NAND erase performance improvements
Joakim Tjernlund joakim.tjernl...@transmode.se wrote: Scott Wood scottw...@freescale.com wrote on 2010/12/08 18:18:39: On Wed, 8 Dec 2010 08:59:49 +0100 Joakim Tjernlund joakim.tjernl...@transmode.se wrote: On Mon, 6 Dec 2010 22:15:54 -0500 Mark Mason ma...@postdiluvian.org wrote: A few months ago I ran into some performance problems involving UBI/NAND erases holding other devices off the LBC on an MPC8315. I found a solution for this, which worked well, at least with the hardware I was working with. I suspect the same problem affects other PPCs, probably including multicore devices, and maybe other architectures as well. I don't have experience with similar NAND controllers on other devices, so I'd like to explain what I found and see if someone who's more familiar with the family and/or driver can tell if this is useful. The problem cropped up when there was a lot of traffic to the NAND (Samsung K9WAGU08U1B-PIB0), with the NAND being on the LBC along with a video chip that needed constant and prompt attention. If you attach NAND to the LBC, you should not attach anything else to it which is latency-sensitive. This feature makes the LBC useless to us. Is there some workaround or plan to address this limitation? Complain to your support or sales contact. I've complained about it in the past, and got a but pins are a limited resource! response. They need to hear that it's a problem from customers. Done, lets see what I get in return. I think this problem will be a major obstacle for our next generation boards which will be NAND based. It was a big problem, and a big surprise, for me too. The next generation of a couple of the chips on the bus have pcie, but those are noticably more expensive. Another problem I ran into was that the DMA performance from a non-incrementing address was abysmal, PIO turned out to be significantly faster. I guess internally the bus does an entire cacheline transfer for every word read from a fixed address, or something like that. I was doing DMA from a device that had only six address bits, it should have been in the middle of the bus with the bottom address pins not connected, which would have allowed incrementing address DMA. The transfer speed wasn't so much of a problem, but the longer transfers meant that there was that much less bus bandwidth for the other devices, so we wound up sacrificing CPU to get more bus bandwidth. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: MPC831x (and others?) NAND erase performance improvements
Mark Mason ma...@postdiluvian.org wrote on 2010/12/08 20:26:16: Joakim Tjernlund joakim.tjernl...@transmode.se wrote: Scott Wood scottw...@freescale.com wrote on 2010/12/08 18:18:39: On Wed, 8 Dec 2010 08:59:49 +0100 Joakim Tjernlund joakim.tjernl...@transmode.se wrote: On Mon, 6 Dec 2010 22:15:54 -0500 Mark Mason ma...@postdiluvian.org wrote: A few months ago I ran into some performance problems involving UBI/NAND erases holding other devices off the LBC on an MPC8315. I found a solution for this, which worked well, at least with the hardware I was working with. I suspect the same problem affects other PPCs, probably including multicore devices, and maybe other architectures as well. I don't have experience with similar NAND controllers on other devices, so I'd like to explain what I found and see if someone who's more familiar with the family and/or driver can tell if this is useful. The problem cropped up when there was a lot of traffic to the NAND (Samsung K9WAGU08U1B-PIB0), with the NAND being on the LBC along with a video chip that needed constant and prompt attention. If you attach NAND to the LBC, you should not attach anything else to it which is latency-sensitive. This feature makes the LBC useless to us. Is there some workaround or plan to address this limitation? Complain to your support or sales contact. I've complained about it in the past, and got a but pins are a limited resource! response. They need to hear that it's a problem from customers. Done, lets see what I get in return. I think this problem will be a major obstacle for our next generation boards which will be NAND based. It was a big problem, and a big surprise, for me too. The next generation of a couple of the chips on the bus have pcie, but those are noticably more expensive. Can you think of any workaround such as not connecting the BUSY pin at all? ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: MPC831x (and others?) NAND erase performance improvements
On Wed, 8 Dec 2010 20:57:03 +0100 Joakim Tjernlund joakim.tjernl...@transmode.se wrote: Mark Mason ma...@postdiluvian.org wrote on 2010/12/08 20:26:16: Joakim Tjernlund joakim.tjernl...@transmode.se wrote: Scott Wood scottw...@freescale.com wrote on 2010/12/08 18:18:39: On Wed, 8 Dec 2010 08:59:49 +0100 Joakim Tjernlund joakim.tjernl...@transmode.se wrote: If you attach NAND to the LBC, you should not attach anything else to it which is latency-sensitive. This feature makes the LBC useless to us. Is there some workaround or plan to address this limitation? Complain to your support or sales contact. I've complained about it in the past, and got a but pins are a limited resource! response. They need to hear that it's a problem from customers. Done, lets see what I get in return. I think this problem will be a major obstacle for our next generation boards which will be NAND based. It was a big problem, and a big surprise, for me too. The next generation of a couple of the chips on the bus have pcie, but those are noticably more expensive. Can you think of any workaround such as not connecting the BUSY pin at all? Maybe connect the busy pin to a gpio? -Scott ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: MPC831x (and others?) NAND erase performance improvements
Scott Wood scottw...@freescale.com wrote on 2010/12/08 20:59:28: On Wed, 8 Dec 2010 20:57:03 +0100 Joakim Tjernlund joakim.tjernl...@transmode.se wrote: Mark Mason ma...@postdiluvian.org wrote on 2010/12/08 20:26:16: Joakim Tjernlund joakim.tjernl...@transmode.se wrote: Scott Wood scottw...@freescale.com wrote on 2010/12/08 18:18:39: On Wed, 8 Dec 2010 08:59:49 +0100 Joakim Tjernlund joakim.tjernl...@transmode.se wrote: If you attach NAND to the LBC, you should not attach anything else to it which is latency-sensitive. This feature makes the LBC useless to us. Is there some workaround or plan to address this limitation? Complain to your support or sales contact. I've complained about it in the past, and got a but pins are a limited resource! response. They need to hear that it's a problem from customers. Done, lets see what I get in return. I think this problem will be a major obstacle for our next generation boards which will be NAND based. It was a big problem, and a big surprise, for me too. The next generation of a couple of the chips on the bus have pcie, but those are noticably more expensive. Can you think of any workaround such as not connecting the BUSY pin at all? Maybe connect the busy pin to a gpio? Is BUSY required for sane operation or it an optimization? Is there any risk that the NAND device will drive the LB and corrupt the bus for other devices? I can't tell, haven't studied NAND in detail yet. Jocke ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: MPC831x (and others?) NAND erase performance improvements
On Wed, 8 Dec 2010 21:11:08 +0100 Joakim Tjernlund joakim.tjernl...@transmode.se wrote: Scott Wood scottw...@freescale.com wrote on 2010/12/08 20:59:28: On Wed, 8 Dec 2010 20:57:03 +0100 Joakim Tjernlund joakim.tjernl...@transmode.se wrote: Can you think of any workaround such as not connecting the BUSY pin at all? Maybe connect the busy pin to a gpio? Is BUSY required for sane operation or it an optimization? You could probably get away without it by inserting delays if you know the chip specs well enough. Is there any risk that the NAND device will drive the LB and corrupt the bus for other devices? I think the only thing the NAND chip should be driving is the busy pin, until nCE and nRE are lowered. -Scott ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: MPC831x (and others?) NAND erase performance improvements
Scott Wood scottw...@freescale.com wrote on 2010/12/08 21:25:51: On Wed, 8 Dec 2010 21:11:08 +0100 Joakim Tjernlund joakim.tjernl...@transmode.se wrote: Scott Wood scottw...@freescale.com wrote on 2010/12/08 20:59:28: On Wed, 8 Dec 2010 20:57:03 +0100 Joakim Tjernlund joakim.tjernl...@transmode.se wrote: Can you think of any workaround such as not connecting the BUSY pin at all? Maybe connect the busy pin to a gpio? Is BUSY required for sane operation or it an optimization? You could probably get away without it by inserting delays if you know the chip specs well enough. Urgh, that does not feel like a good solution. One would have add big margins to the delays mking the NAND much slower. Is there any risk that the NAND device will drive the LB and corrupt the bus for other devices? I think the only thing the NAND chip should be driving is the busy pin, OK, good. What function is actually lost if one uses an GPIO instead of BUSY? You think Freescale could test and validate a GPIO solution? I don't think we will be very happy to design our board around an unproven workaround. An even better workaround would be if one could add logic between the NAND and the CPU which would compensate for this defect without needing special SW fixes. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: MPC831x (and others?) NAND erase performance improvements
Joakim Tjernlund joakim.tjernl...@transmode.se wrote: Is there any risk that the NAND device will drive the LB and corrupt the bus for other devices? I think the only thing the NAND chip should be driving is the busy pin, OK, good. What function is actually lost if one uses an GPIO instead of BUSY? I tried to take this route, since it was a fairly minor board change. Unfortunately all the GPIOs were already used. I looked long and hard for a way to not have the NAND hold the bus with BUSY. If you don't connect BUSY then you can't use the LBC's flash controller (FCM mode). I haven't done it personally, but I believe that connecting BUSY to a GPIO is very common thing to do, since this is the route you'd have to take if you didn't have a built-in flash controller. The Linux MTD layer supports it. An even better workaround would be if one could add logic between the NAND and the CPU which would compensate for this defect without needing special SW fixes. That probably won't work. Presumably you're talking about something like a gate so the BUSY is only passed from the NAND when the NAND's chip select is asserted. Unfortunately the NAND controller is monitoring the BUSY line, and if it sees the signal deassert then it will think the NAND is done. I don't think that using a software NAND controller instead of the LBC FCM mode is all that bad. Again, I haven't actually done it, so check the MTD docs, but I'm pretty sure the software is meant to do that, so it doesn't even really constitute a fix. Assuming that it is supported then I doubt that configuring the NAND layer to use your setup would be any harder than configuring the FCM. And, U-Boot uses the Linux MTD code, so you'd get the same support there. There might also be a way to keep the BUSY and find a workaround with the other chips on the bus, depending on what they are. Chances are that they have a BUSY, but maybe you could move the peripheral's BUSY to another LBC line and use UPM mode to interpret that line as a BUSY. Writing UPM programs isn't really all that difficult (well, not the second time you do it anyway). That's getting into kludge territory, though. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: MPC831x (and others?) NAND erase performance improvements
On Wed, 8 Dec 2010 22:26:59 +0100 Joakim Tjernlund joakim.tjernl...@transmode.se wrote: Scott Wood scottw...@freescale.com wrote on 2010/12/08 21:25:51: On Wed, 8 Dec 2010 21:11:08 +0100 Joakim Tjernlund joakim.tjernl...@transmode.se wrote: Scott Wood scottw...@freescale.com wrote on 2010/12/08 20:59:28: On Wed, 8 Dec 2010 20:57:03 +0100 Joakim Tjernlund joakim.tjernl...@transmode.se wrote: Can you think of any workaround such as not connecting the BUSY pin at all? Maybe connect the busy pin to a gpio? Is BUSY required for sane operation or it an optimization? You could probably get away without it by inserting delays if you know the chip specs well enough. Urgh, that does not feel like a good solution. No, but you asked if it could be done, and if it was just a performance issue. :-) Is there any risk that the NAND device will drive the LB and corrupt the bus for other devices? I think the only thing the NAND chip should be driving is the busy pin, OK, good. What function is actually lost if one uses an GPIO instead of BUSY? Not much, if you enable interrupts on the GPIO pin. The driver would have to be reworked a bit, of course. You think Freescale could test and validate a GPIO solution? I don't think we will be very happy to design our board around an unproven workaround. Ask your sales/support contacts. An even better workaround would be if one could add logic between the NAND and the CPU which would compensate for this defect without needing special SW fixes. The problem with that is when would you assert the chipselect again to check if it's done? Current SW depends on being able to tell the LBC to interrupt (or take other action) when busy goes away. I suppose you could poll with status reads, which could at least be preempted if you've got something higher priority to do with the LBC. -Scott ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: MPC831x (and others?) NAND erase performance improvements
On Wed, 8 Dec 2010 17:02:45 -0500 Mark Mason ma...@postdiluvian.org wrote: I don't think that using a software NAND controller instead of the LBC FCM mode is all that bad. Again, I haven't actually done it, so check the MTD docs, but I'm pretty sure the software is meant to do that, so it doesn't even really constitute a fix. Assuming that it is supported then I doubt that configuring the NAND layer to use your setup would be any harder than configuring the FCM. The MTD layer supports some really simple NAND controllers, but what do you mean by not having a controller at all? Hooking everything up to GPIO? Using UPM? There is already a UPM NAND driver, BTW. You would lose hardware ECC and the ability to be interrupt-driven (the latter should be possible with SW changes, using GPIO interrupts). -Scott ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
RE: MPC831x (and others?) NAND erase performance improvements
The problem cropped up when there was a lot of traffic to the NAND (Samsung K9WAGU08U1B-PIB0), with the NAND being on the LBC along with a video chip that needed constant and prompt attention. What I would see is that, as the writes happened, the erases would wind up batched and issued all at once, such that frequently 400-700 erases were issued in rapid succession with a 1ms LBC BUSY cycle per erase. Are those just the reads of the status register polling to determine when the sector erase has completed ? In which case a software delay beteen the reads might work. Writes probably also have to be polled, but the individual writes happen faster. It is possible that an uncached read of another memory area will stall the cpu long enough to allow another LBC master in. One every few writes might be enough. I had to do something similar on rather different hardware ... David ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: MPC831x (and others?) NAND erase performance improvements
David Laight david.lai...@aculab.com wrote: The problem cropped up when there was a lot of traffic to the NAND (Samsung K9WAGU08U1B-PIB0), with the NAND being on the LBC along with a video chip that needed constant and prompt attention. What I would see is that, as the writes happened, the erases would wind up batched and issued all at once, such that frequently 400-700 erases were issued in rapid succession with a 1ms LBC BUSY cycle per erase. Are those just the reads of the status register polling to determine when the sector erase has completed ? No, it's not, since it isn't polling the status register. It's using a hardware line from the NAND to indicate that the NAND is busy. That one hardware line is shared between all devices on the bus, so if one device says it's busy then all bus traffic stops until the NAND deasserts the busy line. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: MPC831x (and others?) NAND erase performance improvements
On Mon, 6 Dec 2010 22:15:54 -0500 Mark Mason ma...@postdiluvian.org wrote: A few months ago I ran into some performance problems involving UBI/NAND erases holding other devices off the LBC on an MPC8315. I found a solution for this, which worked well, at least with the hardware I was working with. I suspect the same problem affects other PPCs, probably including multicore devices, and maybe other architectures as well. I don't have experience with similar NAND controllers on other devices, so I'd like to explain what I found and see if someone who's more familiar with the family and/or driver can tell if this is useful. The problem cropped up when there was a lot of traffic to the NAND (Samsung K9WAGU08U1B-PIB0), with the NAND being on the LBC along with a video chip that needed constant and prompt attention. If you attach NAND to the LBC, you should not attach anything else to it which is latency-sensitive. What I found, though, was that the NAND did not inherently assert BUSY as part of the erase - BUSY was asserted because the driver polled for the status (NAND_CMD_STATUS). If the status poll was delayed for the duration of the erase then the MPC could talk to the video chip while the erase was in progress. At the end of the 1ms delay I would then poll for status, which would complete effectively immediately. That's what we originially did. The problem is that during this interval the NAND chip will be driving the busy pin, which corrupts other LBC transactions. Newer chips have this added text in their reference manuals under NAND Flash Block Erase Command Sequence Example: Note that operations specified by OP3 and OP4 (status read) should never be skipped while erasing a NAND Flash device, because, in case that happens, contention may arise on LGPL4. A possible case is that the next transaction from eLBC may try to use that pin as an output and since the NAND Flash device might already be driving it, contention will occur. In case OP3 and OP4 operations are skipped, it may also happen that a new command is issued to the NAND Flash device even when the device has not yet finished processing the previous request. This may also result in unpredictable behavior. Here's a code snippet from 2.6.37, with some comments I added. drivers/mtd/nand/fsl_elbc_nand.c - fsl_elbc_cmdfunc(): /* ERASE2 uses the block and page address from ERASE1 */ case NAND_CMD_ERASE2: dev_vdbg(priv-dev, fsl_elbc_cmdfunc: NAND_CMD_ERASE2.\n); out_be32(lbc-fir, (FIR_OP_CM0 FIR_OP0_SHIFT) | /* Execute CMD0 (ERASE1). */ (FIR_OP_PA FIR_OP1_SHIFT) | /* Issue block and page address.*/ (FIR_OP_CM2 FIR_OP2_SHIFT) | /* Execute CMD2 (ERASE2). */ /* (delay needed here - this is where the erase happens) */ (FIR_OP_CW1 FIR_OP3_SHIFT) | /* Wait for LFRB (BUSY) to deassert */ /* then issue CW1 (read status).*/ (FIR_OP_RS FIR_OP4_SHIFT)); /* Read one byte. */ out_be32(lbc-fcr, (NAND_CMD_ERASE1 FCR_CMD0_SHIFT) | /* 0x60 */ (NAND_CMD_STATUS FCR_CMD1_SHIFT) | /* 0x70 */ (NAND_CMD_ERASE2 FCR_CMD2_SHIFT)); /* 0xD0 */ out_be32(lbc-fbcr, 0); elbc_fcm_ctrl-read_bytes = 0; elbc_fcm_ctrl-use_mdr = 1; fsl_elbc_run_command(mtd); return; What I did was to issue two commands with fsl_elbc_run_command(), with a 1ms sleep in between (a tightloop delay worked almost as well, the important part was having 1ms between the erase and the status poll). The first command did the FIR_OP_CM0 (NAND_CMD_ERASE1), FIR_OP_PA, and FIR_OP_CM2 (NAND_CMD_ERASE2). The second did the FIR_OP_CW1 (NAND_CMD_STATUS) and FIR_OP_RS. So essentially, you reverted commit 476459a6cf46d20ec73d9b211f3894ced5f9871e :-) Except for the 1ms delay. I know almost nothing at all about the scheduler, but I'm pretty sure that this behavior would cause the scheduler to think the video thread was a CPU hog, since the video thread was running for 1ms for every 20us that the UBI BGT ran, which would cause the scheduler to unfairly prefer the UBI BGT. I initially tried to address this problem with thread priorities, but the unfortunate reality was that either the NAND writes could fall behind or the video chip could fall behind, and there wasn't spare bandwidth to allow either. If you set a realtime priority and have preemption enabled, you should be able to avoid being delayed by more than one NAND transaction, until the realtime thread sleeps. Be careful to ensure that it does sleep enough for other things to run. -Scott ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: MPC831x (and others?) NAND erase performance improvements
Scott Wood scottw...@freescale.com wrote: On Mon, 6 Dec 2010 22:15:54 -0500 Mark Mason ma...@postdiluvian.org wrote: A few months ago I ran into some performance problems involving UBI/NAND erases holding other devices off the LBC on an MPC8315. I found a solution for this, which worked well, at least with the hardware I was working with. I suspect the same problem affects other PPCs, probably including multicore devices, and maybe other architectures as well. I don't have experience with similar NAND controllers on other devices, so I'd like to explain what I found and see if someone who's more familiar with the family and/or driver can tell if this is useful. The problem cropped up when there was a lot of traffic to the NAND (Samsung K9WAGU08U1B-PIB0), with the NAND being on the LBC along with a video chip that needed constant and prompt attention. If you attach NAND to the LBC, you should not attach anything else to it which is latency-sensitive. We found that out the hard way. The 1ms latency wasn't a problem by itself, the real problem was that the quantity of erases issued in a short time significantly decreased the bandwidth available, and that the scheduler saw the video thread use 1ms of CPU time even though it'd only done a couple hundred nanoseconds worth of work. What I found, though, was that the NAND did not inherently assert BUSY as part of the erase - BUSY was asserted because the driver polled for the status (NAND_CMD_STATUS). If the status poll was delayed for the duration of the erase then the MPC could talk to the video chip while the erase was in progress. At the end of the 1ms delay I would then poll for status, which would complete effectively immediately. That's what we originially did. The problem is that during this interval the NAND chip will be driving the busy pin, which corrupts other LBC transactions. This is not what we observed with our flash part. For a page erase, the NAND did not assert the busy pin until the status read was done. This was confirmed with a logic analyzer, and taking advantage of this behavior is the sole purpose of the change. I don't think that this behavior is what's described in the Samsung datasheet, but it is what our parts did. I incorrectly said polled for status in my original post. It did not poll for status, it monitored the busy line from NAND and did a single read from the status register. Newer chips have this added text in their reference manuals under NAND Flash Block Erase Command Sequence Example: Note that operations specified by OP3 and OP4 (status read) should never be skipped while erasing a NAND Flash device, because, in case that happens, contention may arise on LGPL4. A possible case is that the next transaction from eLBC may try to use that pin as an output and since the NAND Flash device might already be driving it, contention will occur. In case OP3 and OP4 operations are skipped, it may also happen that a new command is issued to the NAND Flash device even when the device has not yet finished processing the previous request. This may also result in unpredictable behavior. I would expect those operations to be mandatory. Here's a code snippet from 2.6.37, with some comments I added. drivers/mtd/nand/fsl_elbc_nand.c - fsl_elbc_cmdfunc(): /* ERASE2 uses the block and page address from ERASE1 */ case NAND_CMD_ERASE2: dev_vdbg(priv-dev, fsl_elbc_cmdfunc: NAND_CMD_ERASE2.\n); out_be32(lbc-fir, (FIR_OP_CM0 FIR_OP0_SHIFT) | /* Execute CMD0 (ERASE1). */ (FIR_OP_PA FIR_OP1_SHIFT) | /* Issue block and page address. */ (FIR_OP_CM2 FIR_OP2_SHIFT) | /* Execute CMD2 (ERASE2). */ /* (delay needed here - this is where the erase happens) */ (FIR_OP_CW1 FIR_OP3_SHIFT) | /* Wait for LFRB (BUSY) to deassert */ /* then issue CW1 (read status). */ (FIR_OP_RS FIR_OP4_SHIFT)); /* Read one byte. */ out_be32(lbc-fcr, (NAND_CMD_ERASE1 FCR_CMD0_SHIFT) | /* 0x60 */ (NAND_CMD_STATUS FCR_CMD1_SHIFT) | /* 0x70 */ (NAND_CMD_ERASE2 FCR_CMD2_SHIFT)); /* 0xD0 */ out_be32(lbc-fbcr, 0); elbc_fcm_ctrl-read_bytes = 0; elbc_fcm_ctrl-use_mdr = 1; fsl_elbc_run_command(mtd); return; What I did was to issue two commands with fsl_elbc_run_command(), with a 1ms sleep in between (a tightloop delay worked almost as well, the important part was having 1ms between the erase and the status poll). The first command did the FIR_OP_CM0 (NAND_CMD_ERASE1), FIR_OP_PA, and FIR_OP_CM2 (NAND_CMD_ERASE2). The second did the FIR_OP_CW1 (NAND_CMD_STATUS) and FIR_OP_RS. So essentially, you reverted commit 476459a6cf46d20ec73d9b211f3894ced5f9871e :-) Except for the 1ms delay.
Re: MPC831x (and others?) NAND erase performance improvements
On Tue, 7 Dec 2010 18:24:45 -0500 Mark Mason ma...@postdiluvian.org wrote: Scott Wood scottw...@freescale.com wrote: On Mon, 6 Dec 2010 22:15:54 -0500 Mark Mason ma...@postdiluvian.org wrote: What I found, though, was that the NAND did not inherently assert BUSY as part of the erase - BUSY was asserted because the driver polled for the status (NAND_CMD_STATUS). If the status poll was delayed for the duration of the erase then the MPC could talk to the video chip while the erase was in progress. At the end of the 1ms delay I would then poll for status, which would complete effectively immediately. That's what we originially did. The problem is that during this interval the NAND chip will be driving the busy pin, which corrupts other LBC transactions. This is not what we observed with our flash part. For a page erase, the NAND did not assert the busy pin until the status read was done. This was confirmed with a logic analyzer, and taking advantage of this behavior is the sole purpose of the change. How would that work, in the normal case where you wait for busy to go away before reading status? We observed this corruption happening. It was the motivation for commit 476459a6cf46d20ec73d9b211f3894ced5f9871e. Newer chips have this added text in their reference manuals under NAND Flash Block Erase Command Sequence Example: Note that operations specified by OP3 and OP4 (status read) should never be skipped while erasing a NAND Flash device, because, in case that happens, contention may arise on LGPL4. A possible case is that the next transaction from eLBC may try to use that pin as an output and since the NAND Flash device might already be driving it, contention will occur. In case OP3 and OP4 operations are skipped, it may also happen that a new command is issued to the NAND Flash device even when the device has not yet finished processing the previous request. This may also result in unpredictable behavior. I would expect those operations to be mandatory. ...but you remove them from the original FIR. They're not just mandatory to be done eventually, it has to be done within the one transaction that is atomic at the LBC. I know almost nothing at all about the scheduler, but I'm pretty sure that this behavior would cause the scheduler to think the video thread was a CPU hog, since the video thread was running for 1ms for every 20us that the UBI BGT ran, which would cause the scheduler to unfairly prefer the UBI BGT. I initially tried to address this problem with thread priorities, but the unfortunate reality was that either the NAND writes could fall behind or the video chip could fall behind, and there wasn't spare bandwidth to allow either. If you set a realtime priority and have preemption enabled, you should be able to avoid being delayed by more than one NAND transaction, until the realtime thread sleeps. Be careful to ensure that it does sleep enough for other things to run. I tried that, but if the erases were held off enough to get the other bus bandwidth we required then the NAND writes fell behind and the kernel oom'd. Another possibility (but still hackish) is to have the NAND driver poll rather than be interrupt driven, and disable interrupts while polling. Then, whenever anything else runs (assuming no SMP), it should be safe to access LBC without high latency -- so the scheduler shouldn't get confused. To make it somewhat cleaner, provide the same benefit on SMP, and allow non-LBC things to run, you could use a mutex to synchronize between all LBC users, though that's more work and could be a nuisance if you want to access the LBC from an interrupt handler. -Scott ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
MPC831x (and others?) NAND erase performance improvements
A few months ago I ran into some performance problems involving UBI/NAND erases holding other devices off the LBC on an MPC8315. I found a solution for this, which worked well, at least with the hardware I was working with. I suspect the same problem affects other PPCs, probably including multicore devices, and maybe other architectures as well. I don't have experience with similar NAND controllers on other devices, so I'd like to explain what I found and see if someone who's more familiar with the family and/or driver can tell if this is useful. The problem cropped up when there was a lot of traffic to the NAND (Samsung K9WAGU08U1B-PIB0), with the NAND being on the LBC along with a video chip that needed constant and prompt attention. What I would see is that, as the writes happened, the erases would wind up batched and issued all at once, such that frequently 400-700 erases were issued in rapid succession with a 1ms LBC BUSY cycle per erase. BUSY was shared with all of the devices on the LBC, so the PPC could not talk to the video chip as long as BUSY was asserted by the NAND. This would give us a window of up to 700ms in which the PPC could manage very little communication with other devices on the LBC - in our case the video chip, for which this delay was essentially fatal. I suspect that some multicore chips might have one core effectively halt if that core attempts to access the LBC while the other core (or itself, for that matter) is executing an erase (if they have a similar NAND controller). What I found, though, was that the NAND did not inherently assert BUSY as part of the erase - BUSY was asserted because the driver polled for the status (NAND_CMD_STATUS). If the status poll was delayed for the duration of the erase then the MPC could talk to the video chip while the erase was in progress. At the end of the 1ms delay I would then poll for status, which would complete effectively immediately. Here's a code snippet from 2.6.37, with some comments I added. drivers/mtd/nand/fsl_elbc_nand.c - fsl_elbc_cmdfunc(): /* ERASE2 uses the block and page address from ERASE1 */ case NAND_CMD_ERASE2: dev_vdbg(priv-dev, fsl_elbc_cmdfunc: NAND_CMD_ERASE2.\n); out_be32(lbc-fir, (FIR_OP_CM0 FIR_OP0_SHIFT) | /* Execute CMD0 (ERASE1). */ (FIR_OP_PA FIR_OP1_SHIFT) | /* Issue block and page address.*/ (FIR_OP_CM2 FIR_OP2_SHIFT) | /* Execute CMD2 (ERASE2). */ /* (delay needed here - this is where the erase happens) */ (FIR_OP_CW1 FIR_OP3_SHIFT) | /* Wait for LFRB (BUSY) to deassert */ /* then issue CW1 (read status).*/ (FIR_OP_RS FIR_OP4_SHIFT)); /* Read one byte. */ out_be32(lbc-fcr, (NAND_CMD_ERASE1 FCR_CMD0_SHIFT) | /* 0x60 */ (NAND_CMD_STATUS FCR_CMD1_SHIFT) | /* 0x70 */ (NAND_CMD_ERASE2 FCR_CMD2_SHIFT)); /* 0xD0 */ out_be32(lbc-fbcr, 0); elbc_fcm_ctrl-read_bytes = 0; elbc_fcm_ctrl-use_mdr = 1; fsl_elbc_run_command(mtd); return; What I did was to issue two commands with fsl_elbc_run_command(), with a 1ms sleep in between (a tightloop delay worked almost as well, the important part was having 1ms between the erase and the status poll). The first command did the FIR_OP_CM0 (NAND_CMD_ERASE1), FIR_OP_PA, and FIR_OP_CM2 (NAND_CMD_ERASE2). The second did the FIR_OP_CW1 (NAND_CMD_STATUS) and FIR_OP_RS. For a bit more detail... fsl_elbc_run_command() would put the thread issuing the erase to sleep so other threads could run. That did work as planned, except that I was working with a fairly pathalogical case - there was a very high volume of writes to the NAND, and the video chip required very frequent and prompt attention. This meant that the thread that was most likely to run when the NAND erase was in progress was the thread that serviced the video chip. A logic analyzer backed this up. It would show the erase being issued, BUSY (R/B# or LFRB) being asserted for 1ms, one or two 16 bit transactions to the video chip, then another erase, repeating this process hundreds of times in a row. The UBI BGT would run long enough to issue an erase (probably on the order of 20us) then go to sleep. The video thread would then run, and issue a transaction to the chip. That transaction would get blocked until BUSY deasserted, at which point the thread would appear to have run for 1ms, even though it had only executed a single bus transaction. I know almost nothing at all about the scheduler, but I'm pretty sure that this behavior would cause the scheduler to think the video thread was a CPU hog, since the video thread was running for 1ms for every 20us that the UBI BGT ran, which would cause the scheduler to unfairly prefer the UBI BGT. I initially tried to address this problem with thread priorities, but the unfortunate reality was that either the NAND writes could fall behind or the video chip could