Wow, this is a great discovery even if there still is room for improvement..!
Does this only concern GTA02 or can a similar thing be applied to GTA01 as well? /Micael On Mon, Oct 20, 2008 at 8:46 PM, Harald Welte <[EMAIL PROTECTED]> wrote: > Hi! > > As part of Swisscom's efforts to speed up the boot process, I discovered > that the NAND timings that GTA02 uses are very suboptimal. The actual > access cycle for one byte is 45ns, but current u-boot and kernel code > use access cycles up to 190ns per byte. > > The NAND timing calculations are as follows (values given for timings > after my patches are applied): > > tR = 25uS > tACLS = 0ns > tWRPH0 = 30ns > tWRPH1 = 20ns > > cmd_addr_cycle = tACLS + tWRPH0 + tWRPH1 = 50ns > data_cycle = tWRPH0 + tWRPH1 = 50ns > read = 0x00 + 5*cmd_addr_cycle + 0x30 + tR + (2048+64)*data_cycle > read = 7 * cmd_addr_cycle + 25us + 2112 * data_cycle > read = 7 * 50ns + 25us + 2112 * 50ns > read = 130.95us (15.639 MByte/sec) > > The latter is the theoretical maximum read performance of the NAND > flash that GTA02 uses. > > If we use the timings of the various existing bits of code, then we get the > following results: > > read_old_uboot (30/80/80) 364.25us 5.623 MByte/sec > read_old_kernel (30/70/30) 237.11us 8.637 MByte/sec > read_new_kernel (0/30/20) 130.95us 15.639 MByte/sec > theoretical (0/25/15) 120.36us 17.016 MByte/sec > > Therefore, by using the correct timings, I expect a 81% improvement > of the theoretical read performance. > > But lets look at some measurements: > > old 2.6.24 kernel: > ======================== > s3c2440-nand s3c2440-nand: Tacls=3, 30ns Twrph0=7 70ns, Twrph1=3 30ns > [EMAIL PROTECTED]:/dev# time dd if=/dev/mtd6 of=/dev/null > 505088+0 records in > 505088+0 records out > real 0m 54.16s > user 0m 0.48s > sys 0m 53.25s > [EMAIL PROTECTED]:/dev# time dd bs=2048 if=/dev/mtd6 of=/dev/null > 126272+0 records in > 126272+0 records out > real 0m 51.05s > user 0m 0.08s > sys 0m 50.37s > ======================== > result (based on 512byte dd): 505088*512 = 252.544 MByte => 4.646MByte/sec > > Thus, the actual performance is somewhere around 53% of the theoretical > speed, given the timings of the old (current) GTA02 kernel source. This > is disappointing, and requires further investigation. > > new 2.6.24 kernel (using my timing related patches): > ======================== > s3c2440-nand s3c2440-nand: Tacls=1, 10ns Twrph0=3 30ns, Twrph1=2 20ns > [EMAIL PROTECTED]:~# time dd if=/dev/mtd6 of=/dev/null > 505088+0 records in > 505088+0 records out > real 0m 38.31s > user 0m 0.36s > sys 0m 37.93s > [EMAIL PROTECTED]:~# time dd bs=2048 if=/dev/mtd6 of=/dev/null > 126272+0 records in > 126272+0 records out > real 0m 35.18s > user 0m 0.12s > sys 0m 35.03s > ======================== > result (based on 512byte dd): 6.592MByte/sec (41.9% speed-up) > > So instead of a calculated expected 81% improvement, we only get 41.9%. > > Still quite significant. > > Comparing the theoretical throughput for the new timings with the actual > throughput of the new timings, we only get to 42% of what should be possible. > > In order to see how much effect hardware ECC has, I tried a kernel with > 'hardware_ecc=1' in the bootargs: > > new 2.6.24 kernel with hwecc: > ======================== > [EMAIL PROTECTED]:~# time dd if=/dev/mtd6 of=/dev/null > 505088+0 records in > 505088+0 records out > real 0m 27.46s > user 0m 0.37s > sys 0m 27.09s > ======================== > result (based on 512byte dd): 9.197MByte/sec (98% speed-up) > > So with hardware-ECC and the new timings we get a 98% speed-up compared > to the original kernel. > > And comparing new timings with soft-ecc and hard-ecc, we see a 39% > improvement. > > Finally, using HWECC we get to 58% of the theoretical throughput. This is > already good, but there's still a software bottleneck somewhere. > > Furthermore, the CPU load during NAND read is still close to 100%. To some > extent, this is expected. The S3C24xx NAND controller cannot do DMA and > thus we need to read each word from the controller (PIO). > > However, I don't think that all of the time is spent copying data, but rather > polling for when data is finished. The s3c244x (not 2410) support a RnB > interrupt which should solve this issue. > > The mainline kernel NAND code doesn't have infrastructure for this yet, > but I'm working on this right now. > > In any case, I'd recommend to test+apply my patches. 41.9% increased NAND > performance are probably of good use to every GTA02 user :) > > Cheers, > -- > - Harald Welte <[EMAIL PROTECTED]> http://openmoko.org/ > ============================================================================ > Software for the world's first truly open Free Software mobile phone > > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.9 (GNU/Linux) > > iD8DBQFI/NIVXaXGVTD0i/8RAugHAKCkgwN8n88vWQ2Zm+bBlytskEa+ywCdEpxg > l0+kMHMIlp3p/wd4l4Px6uY= > =EVl2 > -----END PGP SIGNATURE----- > >
