Poor IDE performance on Linux 2.6.x

2004-08-16 Thread Felix Domke

Hi,

i'm using a Linux 2.6.8-rc4 (linuxppc tree) on a Pallas (PPC405 core
plus Set-Top-Box-specialized SOC)-based  board (Redwood 5 like).

The IDE driver in use is ibm_ocp_ide.c, in UDMA-33 mode.

When measured with "hdparm -t", we get a HDD performance of about
11MB/s. With an older kernel like 2.4.20, the performance was - with
same Hardware - about 22MB/s, i.e. twice as high.

I tried different IO-schedulers, but, as expected, as there is only one
process accessing the harddisk, there was no difference. The IDE-driver
seems to be ok - i made some measurements, and the time from
"ide_do_rw_disk" until the end of the IDE-irq isn't longer than expected
(and gives a raw IDE performance of about ~29MB/s, which is near the
theoretical limit of 33MB/s of the UDMA-Bus. The harddisk performance
doesn't seem to matter as it's >11MB/s, and seems to make some prefetch,
so that the next data is already read from disk into the drive's cache
when the DMA transfer starts. The first DMA transfers are slower,
probably due seek time and real read time etc. ).
The time measured (i won't tell exact numbers as they depend on the
transfered size and the time required for the printks) included the IDE
command processing time (i.e., time after issuing the IDE command until
the IDE device asserted DRQ), so it's some "worst case timing".

The problem seems to be the delay after the successfull termination of
the read-command until the next ide_do_rw_disk is called. I was - mainly
because i don't know the IO subsystem of the kernel too much - unable to
trace down what's going on there.

I hacked the kernel profiler to use a critical interrupt (available on
4xx) and an on-cpu compare timer, so i was able to profile even in IRQ
time.
The profile, sorted and tailed, looks like:
31 run_timer_softirq  0.0718
42 __flush_dcache_icache  0.5526
94 invalidate_dcache_range1.9583
   103 finish_task_switch 0.5598
   199 memset 2.1630
   533 __do_softirq   2.3795
  4404 __copy_tofrom_user 7.8085
  9819 cpu_idle 175.3393
 27760 default_idle 301.7391
 43154 total  0.2316

so except some "copy_tofrom_user", the CPU is just idling around.

Can anybody tell me where to look at, i.e. where the time is spend
between a successfull termination of a transfer and the start of the
next io? Userspace just reads BIG blocks (10MB or so), so userspace
latency doesn't seem to be the problem.

hdparm -T gives about 46MB/s, which is about the half of our memory
performance.

Felix

** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/





IBM OCP IDE fixes

2004-03-09 Thread Felix Domke

Hi,

it seems that there's a bug in drivers/ide/ppc/ibm_ocp_ide.c, in
ocp_ide_build_dmatable in the current (linuxppc) 2.6.
Explanation: pvprv gets initialized to NULL, and for each bio segment
BIOVEC_PHYS_MERGEABLE will be called to see if it can be merged with the
previous one. In the first iteration, this will fail, giving a
null-pointer to BIOVEC_PHYS_MERGEABLE, which doesn't check for this
condition, leading to an Oops.

As the first segment can never be merged with something else, checking
for a null pvprv should be enough.


Speaking of the ibm_ocp_ide.c, it should be inserted into the Makefile
in drivers/ide ("ide-core-$(CONFIG_BLK_DEV_IDE_STB04xxx) +=
ppc/ibm_ocp_ide.o"), and the std_ide_cntl must be called. Not sure if my
patch is the correct way here.

Additionally, the ocp driver issues IDE commands on his own in dma mode,
which is wrong for 48bit addressing. I made simple workaround, but a
more generic function might be called instead.

Then there are some simple compile fixes (missing headerfile, which
isn't of any use anyway and replacement of hw_init_dma_channel against
ppc4xx_init_dma_channel). ide_dma_off seems to be not required anymore.

Finally, the udelay(1000*1000) have to be replaced by mdelay(1000) in
the spinup wait. Maybe this loop should be replaced by the more generic
IDE spinup loop.

Comments?


Felix

diff -Naur linuxppc-2.5-vanilla/drivers/ide/Makefile
linux-2.6/drivers/ide/Makefile
--- linuxppc-2.5-vanilla/drivers/ide/Makefile2004-03-02
22:17:17.0 +0100
+++ linux-2.6/drivers/ide/Makefile2004-03-04 18:33:12.0 +0100
@@ -37,6 +37,7 @@
 ide-core-$(CONFIG_BLK_DEV_MPC8xx_IDE)+= ppc/mpc8xx.o
 ide-core-$(CONFIG_BLK_DEV_IDE_PMAC)+= ppc/pmac.o
 ide-core-$(CONFIG_BLK_DEV_IDE_SWARM)+= ppc/swarm.o
+ide-core-$(CONFIG_BLK_DEV_IDE_STB04xxx) += ppc/ibm_ocp_ide.o

 obj-$(CONFIG_BLK_DEV_IDE)+= ide-core.o
 obj-$(CONFIG_IDE_GENERIC)+= ide-generic.o
diff -Naur linuxppc-2.5-vanilla/drivers/ide/ide.c
linux-2.6/drivers/ide/ide.c
--- linuxppc-2.5-vanilla/drivers/ide/ide.c2004-03-02
22:16:11.0 +0100
+++ linux-2.6/drivers/ide/ide.c2004-03-04 18:33:12.0 +0100
@@ -2156,7 +2156,7 @@
 pnpide_init(1);
 }
 #endif /* CONFIG_BLK_DEV_IDEPNP */
-#ifdef CONFIG_BLK_DEV_STD
+#if defined(CONFIG_BLK_DEV_STD) || defined(CONFIG_BLK_DEV_IDE_STB04xxx)
 {
 extern void std_ide_cntl_scan(void);
 std_ide_cntl_scan();
--- linuxppc-2.5-vanilla/drivers/ide/ppc/ibm_ocp_ide.c2004-03-02
22:16:52.0 +0100
+++ linux-2.6/drivers/ide/ppc/ibm_ocp_ide.c2004-03-04
19:04:29.0 +0100
@@ -23,7 +23,7 @@
 #include 
 #include 

-#include "ide_modes.h"
+// #include "ide_modes.h"

 #define IDE_VER"2.0"
 ppc_dma_ch_t dma_ch;
@@ -383,8 +383,8 @@
 else
 consistent_sync((void *)vaddr,
 size, PCI_DMA_FROMDEVICE);

-if (!BIOVEC_PHYS_MERGEABLE(bvprv, bvec)) {
+if (bvprv && !BIOVEC_PHYS_MERGEABLE(bvprv, bvec)) {
 if (ocp_ide_build_prd_entry(&table,
 prd_paddr,
 prd_size,
@@ -581,12 +580,18 @@
 {
 if (!ocp_ide_build_dmatable(drive, writing))
 return 1;
+
+int lba48bit;

 drive->waiting_for_dma = 1;
 if (drive->media != ide_disk)
 return 0;
+
+lba48bit = ((drive->id->cfs_enable_2 & 0x0400) ? 1 : 0) &&
(drive->addressing);
+
 ide_set_handler(drive, &ocp_ide_dma_intr, WAIT_CMD, NULL);
-HWIF(drive)->OUTB(writing ? WIN_WRITEDMA : WIN_READDMA,
+HWIF(drive)->OUTB(writing ? (lba48bit ? WIN_WRITEDMA_EXT :
WIN_WRITEDMA)
+: (lba48bit ? WIN_READDMA_EXT : WIN_READDMA),
  IDE_COMMAND_REG);
 return __ocp_ide_dma_begin(drive, writing);
 }
@@ -642,7 +647,7 @@
 if ((stat & 0x80) == 0) {
 break;
 }
-udelay(1000 * 1000);/* 1 second */
+mdelay(1000);/* 1 second */
 }

 printk(".");
@@ -657,7 +662,7 @@
 if ((stat & 0x80) == 0) {
 break;
 }
-udelay(1000 * 1000);/* 1 second */
+mdelay(1000);/* 1 second */
 }
 if( i < 30){
 outb_p(0xa0, io_ports[6]);
@@ -715,7 +720,7 @@
 dma_ch.ch_enable = 0;/* No chaining */
 dma_ch.tcd_disable = 1;/* No chaining */

-if (hw_init_dma_channel(IDE_DMACH, &dma_ch) != DMA_STATUS_GOOD)
+if (ppc4xx_init_dma_channel(IDE_DMACH, &dma_ch) != DMA_STATUS_GOOD)
 return -EBUSY;

 /* init CIC select2 reg to connect external DMA port 3 to internal
@@ -772,8 +777,10 @@

 if(!ocp_ide_spinup(hwif->index))
 return 0;
-
-return 1;
+
+  probe_hwif_init(hwif);
+
+return 1;
 }


@@ -821,7 +829,6 @@
 ide_hwifs[index].tuneproc = &ocp_ide_tune_drive;
 ide_hwifs[index].drives[0].autotune = 1;
 ide_hwifs[index].autodma = 1;
-ide_hwifs[index].ide_dma_off = &ocp_ide_dma_off;
 ide_hwifs[index].ide_

405 Critical Interrupts

2003-11-26 Thread Felix Domke

Hi,

i need to have a low-latency interrupt on a 405-based chip with linux 2.4.

Did anybody yet worked on this?

I thought about routing the CriticalInterrupt pretty much the same way
as the HardwareInterrupt, but with disabling MSR_CE. MSR_CE would be
enabled then even in (normal) interrupts, we probably have to add a
__crit_cli and __save_and_crit_cli as someone already suggested.

Does CRIT_EXCEPTION work? Is do_IRQ reentrant? Should i use the same
interrupt processing as a normal hardware interrupt, with the exception
that only "critical"-flagged interrupts are processed?

Any suggestions?

The background: the IBM-STB045xx's capture port, which we use for
IR-decoding, doesn't have any buffering, so when a time-consuming
interrupt is processed (PIO network, maybe PIO ide), we miss IR cycles.

Felix


** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/





ppcboot

2003-03-20 Thread Felix Domke

Pascal wrote:
> I'm working on a motorola MPc8xx. i would like to know if it's possible
>  to run ppcboot trough an another ppcboot
on dbox2, we boot the ppcboot-elf from another firststage bootloader. works.


on another system (dreambox), i boot u-boot from another firststate
bootloader (IBM's openbios.). works.


just take out all sdram init stuff et al and produce a file loadble from
  your bootloader.


felix


** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/





IDE corruption w/ 48 Bit addressing

2003-03-16 Thread Felix Domke

> Users report harddisk corruption, and a quick test showed, that data
> written to 0x18+x (LBA sector 0xC00+x/512) is also written
> to x. (direct O_LARGEFILE-access to /dev/discs0/disc).
OK, update, this was a bit misleading:

when reading from the device, the upper 3 bytes aren't updated in the
right way, so the "previous content" (as specified in the ATA/6 specs)
seem to be invalid and contain, well, wrong content.

felix


** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/





IDE corruption w/ 48 Bit addressing

2003-03-16 Thread Felix Domke

Hi,

i'm having a PPC-405 based board (IBM STB04500 if anyone cares), and i'm
using a Maxtor 6Y200L0, a 200GB harddisk drive.

Obviously this uses 48bit addressing, and i'm using linucppc 2.4.21-pre4
devel. The same bug occurs with 2.4.20 release.

Users report harddisk corruption, and a quick test showed, that data
written to 0x18+x (LBA sector 0xC00+x/512) is also written
to x. (direct O_LARGEFILE-access to /dev/discs0/disc).

This will of course corrupt the filesystem.

Now my questions:

  - is this a bug in the IDE low-level interface driver?
  - or maybe in the kernel?
  - or maybe fixed in newer versions?
  - why does the corruption start at this lba sector?


User reported that this occurs with different HDD models and brands,
too, but only with 48bit-drives. Everything else works perfectly.

felix


** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/





reading block of data from host problem!!!!

2003-03-06 Thread Felix Domke

Anand Franklin J wrote:
> Hello All,
> can any body tell what is the problem, it just hangs in reading the
> "block 1" of zImage.treeboot and not proceeding further from tftp
> booting process. I am using IBM power pc redwood 4.

Although this isn't related to linux, this probably only means that no
tftp server is running at all.

(yes, i got this "error report" from a customer, and this was the fault)

try "tftp localhost" -> "get zImage.treeboot" (or whatever) and see if
tftp is *really* working on your tftp server machine.

felix


** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/





file alignment of elf sections

2003-02-11 Thread Felix Domke

hi,

i'm currently porting u-boot to a ppc405-based board, but i'm failing
miserably at the first step - correct linking.

to make it short (and keep it a bit ontopic, sorry):

how can i tell ld to align the section in the FILE to 2^16 ? i have my
.text-section started at (loading address) 5MB, and like to have it in
the elf-file at 64k.  i saw lds producing exactly this output, but in my
case the .text-section starts immediately after the elf header.  (the
reason is that i have to convert the binary to a special format for the
primary bootloader, and the tool for converting is rather dumb, but i
don't want to invest time into fixing the tool since i didn't wrote it
and nobody else cares about it).

There must be a simple option, but i can't find it :/  (even after
googling around)

I have to admit that i did this before, but i can't remember HOW i did it.

felix


** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/





C++ Library recommendations ...

2003-01-28 Thread Felix Domke

Jaap-Jan Boor wrote:
> I just use libstd++ coming with gnu g++, it's not too big (shared ~300k)
> compared to glibc (shared ~1.2 M)

problem is that the STL is a template library, so a lot of code is
produced when using these tamplates.

i highly recommend to use normal lists and cast pointers (like in good
old C times) again, even with the need of allocating two chunks per list
item (pointer and data itself).
Using STL makes your application MUCH bigger, and, often slower.
STL is optimized for huge data structures, but for most things
memcpy'ing (and using for example an array/vector) is much faster than
using a list or hash, which has optimal - for example linear or even log
- complexity. The thing the STL guy forgot is to keep in mind than
1000*linear (list insertions... ) is still worse than 1*exp (memcpy when
doing vector insertions... but take this only as an example) when your
list has, for example, 5 entries.

and most of the lists are NOT accessed ten thousand times, do NOT have
one million entries where 1000*linear is a LOT more than 1*exp complexity.

just my 2 cent...

felix


** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/





non-PCI OHCI (STBxxxx)

2003-01-06 Thread Felix Domke

Hi,

i'm using a IBM STB04xxx-based system (PPC405 core with Set-Top-Box MPEG
functions as well as some other stuff), which has an integrated OHCI
controller  i'd like to use.
Obviously there is no PCI bus on that system, so it's not as simple as
possible.

I hacked a 2.4.19-preX ohci-driver to use some consistent_alloc, _sync
instead of the pci functions and hardcoded baseaddress and IRQ. This
worked (a bit), but was very unstable and stopped working totally with
-rc3. Don't know what exactly changed, since i'm very new into USB at
all, and i'm really unable to debug what's going wrong for example on
"device not accepting new address" etc., since this already requires
working USB transfers etc.

A  bit disappointed by commtens in ohci.h which state that it's "not so
easy" to use non-PCI OHCI controllers i looked into the SA-case, but
- well, it didn't helped me too much and seems to require huge hacks
(for example they emulate the pci-functions.. or is that the way to go?)

I then tried to use a 2.5 kernel. The ohci-stuff is well structured
there, and i made some ohci-ocp.c and hacked the use of the
pci-functions again. Result was a working USB support, but somewhere
there's still an error, as there is some data inconsistency. for
example, i burned an audio cd, and it contained noise about every
second. When i read a FAT disk, there're randomly some "invalid cluster
chain" error messages etc. Maybe some cache problems. Don't know, and as
said, i'm unable to debug this further without help :(

So i'm asking: Is there any standard approach to this? Maybe there's
already a patch flying around? If someone from Monta Vista is reading
this: Is this going to be supported?

If not: How is this going to be? What exactly are the issues regarding
consistent_alloc, _sync versus their pci-variants? Is it maybe possible
to USE the pci-functions with some dummy pci device?

If i understand correctly,
consistent_alloc allocates contigouus, non-pagable memory which is
directly mapped to bus-addresses,
consistent_sync flushes all writeback caches (if TODEVICE) or
invalidates them (if FROMDEVICE)

is this correct? Do pci_pool_alloc alloc compatible memory? does
pci_map_single "nothing more" (in functional meaning, if we don't look
at other, more complex hardware/bridges) that a consistent_sync and
bus2virt?

And finally: I usually ioremap() to use hardware memory. What's the
difference of using ioremap() vresus bus2virt and virt2bus? are they
deprecated ? are they only possible after an ioremap? or is
kernel-memory all the time mapped to bus addresses which can be
retrieved using virt2bus?

thanks in advance,
Felix Domke


** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/