RE: ML403 gigabit ethernet bandwidth - 2.6 kernel

2007-06-25 Thread Greg Crocker

I was able to achieve ~320 Mbit/sec data rate using the Gigabit System
Reference Design (GSRD XAPP535/536) from Xilinx.  This utilizes the
LocalLink TEMAC to perform the transfers.  The reference design provides the
Linux 2.4 drivers that can be ported to Linux 2.6 with a little effort.

This implementation did not use checksum offloading and the data rates were
achieved using TCP_STREAM on netperf.

Greg
___
Linuxppc-embedded mailing list
Linuxppc-embedded@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-embedded

RE: Follow up on 4 Gig of DDR on MPC8548E

2007-06-25 Thread Benjamin Herrenschmidt

> apparently after more investigations - it looks like there is something in 
> the ext2 driver code
> that is mal-adjustedI haven't talked to the guy today who was looking at 
> that - but the ext2
> driver code that was openning a 'virtual file' / console - had some problems 
> mapping that
> space - again, my gut is telling me more stronger there is a problem with 
> signed/unsigned...
> now deeper in the ext2 code...

I very much doubt there is a signed/unsigned issue in ext2 or
elsewhere...

You have CONFIG_HIGHMEM ? What are your setting for KERNELBASE and
PAGE_OFFSET ?

Ben.


___
Linuxppc-embedded mailing list
Linuxppc-embedded@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-embedded


RE: Follow up on 4 Gig of DDR on MPC8548E

2007-06-25 Thread Morrison, Tom
>> 0x_ to 0x_ or do you have 64GB of mem...

4 GIG DDR Memory...that we say...there is 3 GIG for the moment...
 
the LAW's are  setup in priority order - thus, if you have overlapping
regions (aka: PCI & DDR memory) the one with the lower #'d LAW
is the one it is mapped to

>> If you have all physical mem in the first 32bit, where are your PCI
>> windows set?
>> And in modst cases the PCI devices (if they are bus-mastring ) need 1-1
>> inbound mapping to be able to write to memory.

you can use the LAW's to remap anything to anywhere...
see above for priority - but in our case, we are mapping
the PCI/PEX/Local Bus to the last 4 GIG at a higher 
priority LAW than the DDR...that is what the MPC8548 
spec says how it works...and it does - if you don't tell
the kernel there is more than 2GIG (and ioremap that last
GIG to access the full 'other' 2 GIG...
 
that is not the issue here
 
apparently after more investigations - it looks like there is something in the 
ext2 driver code
that is mal-adjustedI haven't talked to the guy today who was looking at 
that - but the ext2
driver code that was openning a 'virtual file' / console - had some problems 
mapping that
space - again, my gut is telling me more stronger there is a problem with 
signed/unsigned...
now deeper in the ext2 code...

Am at the Freescale Technical Forum the next few days - if any of you guys
are down here in Florida...I am intending to track down a few freescale folk
on this issue...:-)
 
Tom
___
Linuxppc-embedded mailing list
Linuxppc-embedded@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-embedded


Re: Mem-2-Mem DMA - Generalized API

2007-06-25 Thread Matt Sealey
Clifford Wolf wrote:
> Hi,
>
> However, i don't think that implementing stuff like memset() in a dma
> controller is any good because that would just flood the memory bus which
> would then block in 99% of all cases the cpu until the dma is finished.
>
> It would however cost less power than doing that in the CPU. ;-)

At least while the DMA transfer is happening, you could preempt to some
other task. Would it flood the memory bus? When a DMA transfer happens
would it really do it in such a way that it would stall the CPU on a
memory access far more than it would usually?

I think it would have to be extremely badly designed to be even ABLE to do
that, or at least, you'd be doing some incredibly unwise things to be able
to flood it like that.

>> Indeed. I wonder if we could pull apart the IOAT/DMA stuff and genericise
>> it (it should be possible) or simply add to it, or if making a powerpc
>> specific dma engine abstraction would be an easier idea.
>
> I don't think that this would actually be powerpc specific in any way. But
> since such general purpose dma controllers are more common in embedded
> hardware this still seams to be the right place to discuss the issue.

I meant powerpc platform (as in ARCH=powerpc) specific. Rather than dropping
it in the global drivers, just keep it as a library for powerpc. Everyone
else can get it later with a move into the full tree. As long as the headers
have common, easy to get to names that do not conflict with anything
preexisting, it would not affect anything.

Taking IOAT as an example and fixing it's weirdness would be a better start
than making a whole new API, but I think doing development *WITH* IOAT and
potentially trashing all.. umm.. 4 users, and the weird Linux kernel tripling
of development cost when heavily updating an actively maintained subsystem
that everyone else wants to touch, that would be detrimental. We don't want
to break Intel and we don't want to be tracking Intel's patches or having
extra weirdness break in (or for the number of users of that DMA system to
explode underneath New DMA Development)

-- 
Matt Sealey <[EMAIL PROTECTED]>
Genesi, Manager, Developer Relations

___
Linuxppc-embedded mailing list
Linuxppc-embedded@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-embedded


Katmai w/ DENX git - "init has generated signal 4" error

2007-06-25 Thread Stephen Winiecki
I'm using the Denx linux-2.6-denx.git repository with a Katmai board.   I 
want to boot off a disk (initialized w/ Debian 4.0).  I've installed a 
Promise Ultra133 Tx2 IDE controller card in the PCI slot, and configured 
it in the kernel.I boot this same disk w/ IDE card on other 4xx boards 
without a problem (Bamboo w/ Denx 4.1 for example).  I am getting an error 
"init has generated signal 4 but has no handler for it".

## Booting image at 0020 ...
   Image Name:   Linux-2.6.22-rc5-gc8144983-dirty
   Image Type:   PowerPC Linux Kernel Image (gzip compressed)
   Data Size:1328285 Bytes =  1.3 MB
   Load Address: 
   Entry Point:  
   Verifying Checksum ... OK
   Uncompressing Kernel Image ... OK
Linux version 2.6.22-rc5-gc8144983-dirty ([EMAIL PROTECTED]) 
(gcc ve
rsion 4.0.0 (DENX ELDK 4.1 4.0.0)) #4 Mon Jun 25 13:59:54 EDT 2007
AMCC PowerPC 440SPe Katmai Platform
Zone PFN ranges:
  DMA 0 ->   131072
  Normal 131072 ->   131072
early_node_map[1] active PFN ranges
0:0 ->   131072
Built 1 zonelists.  Total pages: 130048
Kernel command line: console=ttyS0,115200 root=/dev/hde3 rw ip=dhcp
PID hash table entries: 2048 (order: 11, 8192 bytes)
Dentry cache hash table entries: 65536 (order: 6, 262144 bytes)
Inode-cache hash table entries: 32768 (order: 5, 131072 bytes)
Memory: 516864k available (2008k kernel code, 676k data, 160k init, 0k 
highmem)
Mount-cache hash table entries: 512
NET: Registered protocol family 16
PCI: Probing PCI hardware
SCSI subsystem initialized
NET: Registered protocol family 2
IP route cache hash table entries: 16384 (order: 4, 65536 bytes)
TCP established hash table entries: 65536 (order: 7, 524288 bytes)
TCP bind hash table entries: 65536 (order: 6, 262144 bytes)
TCP: Hash tables configured (established 65536 bind 65536)
TCP reno registered
io scheduler noop registered
io scheduler anticipatory registered (default)
io scheduler deadline registered
io scheduler cfq registered
Generic RTC Driver v1.07
Serial: 8250/16550 driver $Revision: 1.90 $ 4 ports, IRQ sharing enabled
serial8250: ttyS0 at MMIO 0x0 (irq = 0) is a 16550A
serial8250: ttyS1 at MMIO 0x0 (irq = 1) is a 16550A
serial8250: ttyS2 at MMIO 0x0 (irq = 37) is a 16550A
RAMDISK driver initialized: 16 RAM disks of 4096K size 1024 blocksize
xsysace xsysace.0: Xilinx SystemACE revision 1.0.12
xsysace xsysace.0: capacity: 256512 sectors
 xsa: xsa1
Intel(R) PRO/1000 Network Driver - version 7.3.20-k2
Copyright (c) 1999-2006 Intel Corporation.
PPC 4xx OCP EMAC driver, version 3.54
mal0: initialized, 1 TX channels, 1 RX channels
eth0: emac0, MAC 00:01:73:77:55:27
eth0: found Generic MII PHY (0x01)
e100: Intel(R) PRO/100 Network Driver, 3.5.17-k4-NAPI
e100: Copyright(c) 1999-2006 Intel Corporation
Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
ide: Assuming 33MHz system bus speed for PIO modes; override with 
idebus=xx
PDC20269: IDE controller at PCI slot :00:01.0
PDC20269: chipset revision 2
PDC20269: ROM enabled at 0x000dc000
PDC20269: PLL input clock is 33309 kHz
PDC20269: 100% native mode on irq 52
ide2: BM-DMA at 0xffd0-0xffd7, BIOS settings: hde:pio, hdf:pio
ide3: BM-DMA at 0xffd8-0xffdf, BIOS settings: hdg:pio, hdh:pio
hde: FUJITSU MHT2040AH, ATA DISK drive
hde: host side 80-wire cable detection failed, limiting max speed to 
UDMA33
ide2 at 0xfff8-0x,0xfff6 on irq 52
hde: max request size: 128KiB
hde: 78140160 sectors (40007 MB) w/8192KiB Cache, CHS=65535/16/63, 
UDMA(33)
hde: cache flushes supported
 hde: hde1 hde2 hde3
i2c /dev entries driver
IBM IIC driver v2.1
ibm-iic0: using standard (100 kHz) mode
ds1307 1-0068: rtc core: registered ds1307 as rtc0
ibm-iic1: using standard (100 kHz) mode
TCP cubic registered
NET: Registered protocol family 1
NET: Registered protocol family 17
ds1307 1-0068: setting the system clock to 2000-04-30 08:13:25 (957082405)
eth0: link is up, 100 FDX
Sending DHCP requests ., OK
IP-Config: Got DHCP answer from 255.255.255.255, my address is 
9.27.218.226
IP-Config: Complete:
  device=eth0, addr=9.27.218.226, mask=255.255.255.128, 
gw=9.27.218.129,
 host=9.27.218.226, domain=raleigh.ibm.com, nis-domain=(none),
 bootserver=255.255.255.255, rootserver=255.255.255.255, rootpath=
kjournald starting.  Commit interval 5 seconds
EXT3 FS on hde3, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
VFS: Mounted root (ext3 filesystem).
Freeing unused kernel memory: 160k init
init has generated signal 4 but has no handler for it
Kernel panic - not syncing: Attempted to kill init!
Rebooting in 180 seconds..

Not sure if I'm missing something obvious - any help appreciated.

Thanks,

Steve


___
Linuxppc-embedded mailing list
Linuxppc-embedded@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-embedded


Re: Mem-2-Mem DMA - Generalized API

2007-06-25 Thread Clifford Wolf
Hi,

On Mon, Jun 25, 2007 at 03:31:59PM +0100, Matt Sealey wrote:
> > The main question remains: Is it possible to have a flexible cross platform
> > DMA API which handles even complex requests and does scheduling,
> > prioritizing, queuing, locking, (re-)building/caching of SG lists... 
> > automagically.
> 
> I would think so. I think there is a fairly generic example in many parts
> of the Linux kernel. Dare I say the Via Unichrome AGP subsystem? And a
> bunch of the ARM/OMAP platforms..? A lot of the code is even identical,
> I wonder why it isn't some library rather than platform drivers.

I've put a 'draft header file' of an api as I would have expected it
online:

http://www.clifford.at/priv/dmatransfer.h

I'd love to hear your feedback on it.

One issue I'm absolutely not sure about atm are the different busses and
their address spaces. The design in the header file is working directly on
'bus addresses' (the thing accessable thru /dev/mem). Does anyone know a
case where this may be insufficient?

> > Filling memory with zero is also a simple task for a DMA engine.
> > (Thinking about malloc() and memset())
> 
> Also xor and logical operations, byte swapping huge chunks of data, that
> kind of thing. Most DMA engines in SoCs have cute features like that. I
> think BestComm can even calculate CRCs for IP packets.

I havent added it yet but such things could be encoded using the
DMATRANSFER_CHUNK_* and DMATRANSFER_* flags.

However, i don't think that implementing stuff like memset() in a dma
controller is any good because that would just flood the memory bus which
would then block in 99% of all cases the cpu until the dma is finished.

It would however cost less power than doing that in the CPU. ;-)

> Indeed. I wonder if we could pull apart the IOAT/DMA stuff and genericise
> it (it should be possible) or simply add to it, or if making a powerpc
> specific dma engine abstraction would be an easier idea.

I don't think that this would actually be powerpc specific in any way. But
since such general purpose dma controllers are more common in embedded
hardware this still seams to be the right place to discuss the issue.

yours,
 - clifford

-- 
Relax, its only ONES and ZEROS!
___
Linuxppc-embedded mailing list
Linuxppc-embedded@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-embedded


Re: Mem-2-Mem DMA - Generalized API

2007-06-25 Thread Clifford Wolf
Hi,

On Mon, Jun 25, 2007 at 12:00:03PM -0500, Olof Johansson wrote:
> That's the case with the dma engine framework today. Expand it, extend
> it, fix it and improve it. Don't duplicate, re-invent and re-implement.

I'm not sure if I can agree with that.

The core idea befind the dma engine framework seams to be to statically
assign dma channels to device drivers. I think that the channels should
be dynamically assigned.

Imo writing an alternative implementation is much easier than hacking that
into the existing framework. Especially since the existing framework has
only one backend driver and only one user.

yours,
 - clifford

-- 
/* You are not expected to understand this */
___
Linuxppc-embedded mailing list
Linuxppc-embedded@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-embedded


RE: Follow up on 4 Gig of DDR on MPC8548E

2007-06-25 Thread Rune Torgersen
> From: Morrison, Tom
> Sent: Friday, June 22, 2007 12:50 PM
> Setup:
>1) 4 Gig of DDR RAM (at physical addresses 0-0xF__)

0x_ to 0x_ or do you have 64GB of mem...

> >> EXT2-fs warning: mounting unchecked fs, running e2fsck is 
> recommended
> >> VFS: Mounted root (ext2 filesystem).
> >>  316k init
> >> EXT2-fs error (device sda1): ext2_check_page: bad entry in 
> directory
> #2: >> unaligned directory entry - offset=0, inode=128,
> rec_len=8961,name_len=69
> >> Warning: unable to open an initial console.

> My gut tells me this might be something to do with the 2 
> Gig boundary and specifically a "signed" versus "unsigned" 
> address/offsets mismatch maybe somewhere in the file system??

If you have all physical mem in the first 32bit, where are your PCI
windows set?
And in modst cases the PCI devices (if they are bus-mastring ) need 1-1
inbound mapping to be able to write to memory.


___
Linuxppc-embedded mailing list
Linuxppc-embedded@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-embedded


Re: Mem-2-Mem DMA - Generalized API

2007-06-25 Thread Olof Johansson
On Mon, Jun 25, 2007 at 03:31:59PM +0100, Matt Sealey wrote:
> 
> Indeed. I wonder if we could pull apart the IOAT/DMA stuff and genericise
> it (it should be possible) or simply add to it, or if making a powerpc
> specific dma engine abstraction would be an easier idea.

It's hard to anticipate all possible uses of a framework when you first
write it.  So, when you first write it with one device in mind, it's
fairly obvious that it might not fit well with the second device that
will use it. That's the case of drivers/dma and IOAT today.

That's the case with the dma engine framework today. Expand it, extend
it, fix it and improve it. Don't duplicate, re-invent and re-implement.


-Olof
___
Linuxppc-embedded mailing list
Linuxppc-embedded@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-embedded


RE: ML403 gigabit ethernet bandwidth - 2.6 kernel

2007-06-25 Thread Glenn . G . Hart

All,

I am also very interested in the network throughput.  I am using the Avenet
Mini-Module which has a V4FX12.  The ML403 is very close to the
Mini-Module.  I am getting a throughput of about 100 Mbps.  The biggest
difference was turning on the cache.  100 MHz vs. 300 MHz only improved the
performance slightly.  Using the checksum offloading was also a big help in
getting the throughput up.  The RX Threshhold also helped, but the jumbo
frames did not seem to help.  I am not sure what I can do to get the 300
Mbps Ming is getting.   I saw on a previous post someone was using 128k
FIFO depth.  I am using a 32k depth.

Glenn




 
 (Embedded "Ming Liu" <[EMAIL PROTECTED]>@ozlabs.org
 
 image moved   06/25/2007 06:03 AM  
 
 to file:   
 
 pic11478.jpg)  
 

 

 


Sent by:
   [EMAIL PROTECTED]


To:[EMAIL PROTECTED]
cc:[EMAIL PROTECTED], linuxppc-embedded@ozlabs.org
Subject:RE: ML403 gigabit ethernet bandwidth - 2.6 kernel

Security Level:?  Internal


Dear Mohammad,

>The results are as follows:
>PC-->ML403
>TCP_SENDFILE : 38Mbps
>
>ML403--->PC
>TCP_SENDFILE: 155Mbps

This result is unreasonable. Because PC is more powerful than your board,
so PC->board should be faster than board->PC.

>The transfer rate from ML403 to PC has improved by a factor of 2,
>I see on the posts here in the mailing list that you have reached a band
width of 301Mbps.

Yes, with all features which could improve performance enabled, we can get
around 300Mbps for TCP transfer. one more hint, did you enable caches on
your system? perhaps it will help. Anyway, double check your hardware
design to make sure all features are enabled.That's all I can suggest.

BR
Ming


>
>
>
>
>
> > From: [EMAIL PROTECTED]
> > To: [EMAIL PROTECTED]; [EMAIL PROTECTED];
linuxppc-embedded@ozlabs.org; [EMAIL PROTECTED]
> > Subject: RE: ML403 gigabit ethernet bandwidth - 2.6 kernel
> > Date: Sat, 23 Jun 2007 19:10:16 +
> >
> > Use the following command in Linux please:
> >
> > ifconfig eth0 mtu 8982
> >
> > As well you should do that on your PC in the measurement.
> >
> > Ming
> >
> >
> > >From: Mohammad Sadegh Sadri
> > >To: Ming Liu ,
> > ,,
> >
> > >Subject: RE: ML403 gigabit ethernet bandwidth - 2.6 kernel
> > >Date: Sat, 23 Jun 2007 19:08:29 +
> > >
> > >
> > >Dear Ming,
> > >
> > >Really thanks for reply,
> > >
> > >about thresholds and waitbound OK! I'll adjust them in adapter.c ,
> > >
> > >but what about enabling jumbo frames? should I do any thing special to
> > enable Jumbo fram support?
> > >
> > >we were thinking that it is enabled by default. Is it?
> > >
> > >thanks
> > >
> > >
> > >
> > >
> > >
> > > > From: [EMAIL PROTECTED]
> > > > To: [EMAIL PROTECTED]; [EMAIL PROTECTED];
> > linuxppc-embedded@ozlabs.org; [EMAIL PROTECTED]
> > > > Subject: RE: ML403 gigabit ethernet bandwidth - 2.6 kernel
> > > > Date: Sat, 23 Jun 2007 18:48:19 +
> > > >
> > > > Dear Mohammad,
> > > > There are some parameters which could be adjusted to improve the
> > > > performance. They are: TX and RX_Threshold TX and RX_waitbound. In
my
> > > > system, we use TX_Threshold=16 and Rx_Threshold=8 and both
waitbound=1.
> > > >
> > > > Also Jumbo frame of 8982 could be enable.
> > > >
> > > > Try those hints and share your improvement with us.
> > > >
> > > > BR
> > > > Ming
> > > >
> > > > >From: Mohammad Sadegh Sadri
> > > > >To: Andrei Konovalov , Linux PPC Linux
> > > > PPC, Grant Likely
> > > > >Subject: ML403 gigabit ethernet bandwidth - 2.6 kernel
> > > > >Date: Sat, 23 Jun 2007 12:19:12 +
> > > > >
> > > > >
> > > > >Dear all,
> > > > >
> > > > >Recently we did a set of tests on performance of virtex 4FX hard
TEMAC
> > > > module using ML403
> > > > >
> > > > >we studied all of the posts here carefully: these are the system
> > > > characteristics;
> > > > >
> > > > >Board : ML403
> > > > >EDK: EDK9.1SP2
> > > > >Hard TEMAC version and PLTEMAC version are both 3.0.a
> > > > >PPC clock frequency :  300MHz
> > > > >Kernel : 2.6.21-rc7 , downloaded from grant's git tree some thing
near
> > one
> > > > week ago
> > > > >DMA type: 3 (sg dma)
> > > > >DRE : enabled for TX and RX, (2)
> > > > >CSUM offload is enabled for both of TX and RX
> > > > >tx and rx fifo sizes : 131072 bits
> > > > >
> > > > >the board comes up over NFS root file system completely and
without
> > any
> 

UART for MPC5200 with DTR, DSR, RI and DCD?

2007-06-25 Thread Mattias Bostr
Hi,

Does anybody know of an UART driver for the MPC5200 that supports use of
the DTR, DSR, RI and DCD signals?

Regards,
Mattias


___
Linuxppc-embedded mailing list
Linuxppc-embedded@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-embedded


Re: Mem-2-Mem DMA - Generalized API

2007-06-25 Thread Matt Sealey

Clemens Koller wrote:
> Hello, Matt!
> 
>> There is so much you can do with most SoC DMA controllers, and it's not
>> even limited to PowerPC (most ARM/XScale SoCs have very capable devices
>> inside too). I can only imagine that nobody got excited over IOAT because
>> the entire programming interface stinks of "offloading gigabit ethernet"
>> and not much else.
> 
> The main question remains: Is it possible to have a flexible cross platform
> DMA API which handles even complex requests and does scheduling,
> prioritizing, queuing, locking, (re-)building/caching of SG lists... 
> automagically.

I would think so. I think there is a fairly generic example in many parts
of the Linux kernel. Dare I say the Via Unichrome AGP subsystem? And a
bunch of the ARM/OMAP platforms..? A lot of the code is even identical,
I wonder why it isn't some library rather than platform drivers.

> Filling memory with zero is also a simple task for a DMA engine.
> (Thinking about malloc() and memset())

Also xor and logical operations, byte swapping huge chunks of data, that
kind of thing. Most DMA engines in SoCs have cute features like that. I
think BestComm can even calculate CRCs for IP packets.

> The problem is IMHO similar to video acceleration. Within the
> Xorg's XAA/EXA/whatever framework, the drivers accelerate certain
> calls if the hardware has the capability to do so. Other calls fall back
> to some default non accelerated memcpy() & friends.
> 
> Sounds like a lot of fun... replacing kernel's and libc's memcpy() with
> memcpy_with_dma_if_possible(). :-)

Indeed. I wonder if we could pull apart the IOAT/DMA stuff and genericise
it (it should be possible) or simply add to it, or if making a powerpc
specific dma engine abstraction would be an easier idea.

Probably the latter to be merged with the former at a later date would
be easier to manage. Take inspiration but don't be bound by Intel's
weird "new" (i.e. 15 year old) concept?

-- 
Matt Sealey <[EMAIL PROTECTED]>
Genesi, Manager, Developer Relations
___
Linuxppc-embedded mailing list
Linuxppc-embedded@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-embedded


Re: Mem-2-Mem DMA - Generalized API

2007-06-25 Thread Clemens Koller
Hello, Matt!

Matt Sealey schrieb:
> IOAT and Intel's DMA engine driver is very IOAT specific in places..
> 
> I had a peek at it as I have a little interest in the concept; at least the
> two platforms Genesi has been supporting (Pegasos and Efika) have quite
> competant DMA engines which are woefully underused (i.e. not at all).

True.

> There exists a Marvell DMA driver somewhere (I have a copy, someone on
> this list posted it about a year ago) and while the MPC5200B doesn't have
> explicit support for DMA from memory to memory (although memory to SRAM
> might work in chunks, or memory to a FIFO wired as a loopback like in
> the docs..??)
 >
> There is so much you can do with most SoC DMA controllers, and it's not
> even limited to PowerPC (most ARM/XScale SoCs have very capable devices
> inside too). I can only imagine that nobody got excited over IOAT because
> the entire programming interface stinks of "offloading gigabit ethernet"
> and not much else.

The main question remains: Is it possible to have a flexible cross platform
DMA API which handles even complex requests and does scheduling, prioritizing,
queuing, locking, (re-)building/caching of SG lists... automagically.

It could fall back to CPU's memcpy if the hardware doesn't have
the ability to use the DMA machine because all channels are already busy,
or the requested memory isn't DMAable or the request is just too small
that it doesn't make sense to setup a DMA channel.
Filling memory with zero is also a simple task for a DMA engine.
(Thinking about malloc() and memset())

The problem is IMHO similar to video acceleration. Within the
Xorg's XAA/EXA/whatever framework, the drivers accelerate certain
calls if the hardware has the capability to do so. Other calls fall back
to some default non accelerated memcpy() & friends.

Sounds like a lot of fun... replacing kernel's and libc's memcpy() with
memcpy_with_dma_if_possible(). :-)

Best regards,
-- 
Clemens Koller
__
R&D Imaging Devices
Anagramm GmbH
Rupert-Mayer-Straße 45/1
Linhof Werksgelände
D-81379 München
Tel.089-741518-50
Fax 089-741518-19
http://www.anagramm-technology.com
___
Linuxppc-embedded mailing list
Linuxppc-embedded@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-embedded


Re: ML403 gigabit ethernet bandwidth - 2.6 kernel

2007-06-25 Thread Bhupender Saharan

Hi,

We need to findout where is the bottlenect.

1. Run vmstat on the ML403 board and find out the percentage CPU is busy
when you are transferring the file. That will show if cpu is busy or not.
2. Run oprofile and find out which are the routines eating away the cpu
time.

Once we have data from both the above routines, we can find out the
bottlenecks.


Regards
Bhupi


On 6/23/07, Mohammad Sadegh Sadri <[EMAIL PROTECTED]> wrote:



Dear all,

Recently we did a set of tests on performance of virtex 4FX hard TEMAC
module using ML403

we studied all of the posts here carefully: these are the system
characteristics;

Board : ML403
EDK: EDK9.1SP2
Hard TEMAC version and PLTEMAC version are both 3.0.a
PPC clock frequency :  300MHz
Kernel : 2.6.21-rc7 , downloaded from grant's git tree some thing near one
week ago
DMA type: 3 (sg dma)
DRE : enabled for TX and RX, (2)
CSUM offload is enabled for both of TX and RX
tx and rx fifo sizes : 131072 bits

the board comes up over NFS root file system completely and without any
problems.

PC system used for these tests is : CPU P4 Dual Core, 3.4GHz , 2Gigabytes
memory, Dual gigabit ethernet port, running linux 2.6.21.3
We have tested the PC system band width and it can easily reach 966mbits/s
when connected to the same PC. ( using the same cross cable used for ml403
test)

Netperf is compiled with TCP SEND FILE enabled, ( -DHAVE_SENDFILE)

(from board to PC)
netperf -t TCP_SENDFILE -H 10.10.10.250 -F /boot/zImage.elf -- -m 16384 -s
87380 -S 87380

the measured bandwidth for this test was just 40.66Mbits.
It is also true for netperf from PC to board.

we do not have any more idea about what we should do to improve the
bandwidth.
any help or ideas is appreciated...

_
Connect to the next generation of MSN Messenger

http://imagine-msn.com/messenger/launch80/default.aspx?locale=en-us&source=wlmailtagline
___
Linuxppc-embedded mailing list
Linuxppc-embedded@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-embedded

___
Linuxppc-embedded mailing list
Linuxppc-embedded@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-embedded

Re: Mem-2-Mem DMA - Generalized API

2007-06-25 Thread Matt Sealey
IOAT and Intel's DMA engine driver is very IOAT specific in places..

I had a peek at it as I have a little interest in the concept; at least the
two platforms Genesi has been supporting (Pegasos and Efika) have quite
competant DMA engines which are woefully underused (i.e. not at all).

There exists a Marvell DMA driver somewhere (I have a copy, someone on
this list posted it about a year ago) and while the MPC5200B doesn't have
explicit support for DMA from memory to memory (although memory to SRAM
might work in chunks, or memory to a FIFO wired as a loopback like in
the docs..??)

There is so much you can do with most SoC DMA controllers, and it's not
even limited to PowerPC (most ARM/XScale SoCs have very capable devices
inside too). I can only imagine that nobody got excited over IOAT because
the entire programming interface stinks of "offloading gigabit ethernet"
and not much else.

-- 
Matt Sealey <[EMAIL PROTECTED]>
Genesi, Manager, Developer Relations

Arnd Bergmann wrote:
> On Sunday 24 June 2007, Clifford Wolf wrote:
>> I'm working on an MPC8349E based project and as some of you might know this
>> chip has a four channel (bus-) memory-to-memory DMA controller.
>>
>> Unfortunately the linux kernel is atm lacking a generic interface for such
>> DMA controllers.
> 
> So what's wrong with the include/linux/dmaengine.h API? I thought it was
> designed to cover this sort of DMA controller?
> 
>   Arnd <><
> ___
> Linuxppc-embedded mailing list
> Linuxppc-embedded@ozlabs.org
> https://ozlabs.org/mailman/listinfo/linuxppc-embedded
___
Linuxppc-embedded mailing list
Linuxppc-embedded@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-embedded


RE: ML403 gigabit ethernet bandwidth - 2.6 kernel

2007-06-25 Thread Ming Liu
Dear Mohammad,

>The results are as follows:
>PC-->ML403
>TCP_SENDFILE : 38Mbps
>
>ML403--->PC
>TCP_SENDFILE: 155Mbps

This result is unreasonable. Because PC is more powerful than your board, 
so PC->board should be faster than board->PC.

>The transfer rate from ML403 to PC has improved by a factor of 2,
>I see on the posts here in the mailing list that you have reached a band 
width of 301Mbps.

Yes, with all features which could improve performance enabled, we can get 
around 300Mbps for TCP transfer. one more hint, did you enable caches on 
your system? perhaps it will help. Anyway, double check your hardware 
design to make sure all features are enabled.That's all I can suggest.

BR
Ming


>
>
>
>
>
> > From: [EMAIL PROTECTED]
> > To: [EMAIL PROTECTED]; [EMAIL PROTECTED]; 
linuxppc-embedded@ozlabs.org; [EMAIL PROTECTED]
> > Subject: RE: ML403 gigabit ethernet bandwidth - 2.6 kernel
> > Date: Sat, 23 Jun 2007 19:10:16 +
> >
> > Use the following command in Linux please:
> >
> > ifconfig eth0 mtu 8982
> >
> > As well you should do that on your PC in the measurement.
> >
> > Ming
> >
> >
> > >From: Mohammad Sadegh Sadri
> > >To: Ming Liu ,
> > ,,
> >
> > >Subject: RE: ML403 gigabit ethernet bandwidth - 2.6 kernel
> > >Date: Sat, 23 Jun 2007 19:08:29 +
> > >
> > >
> > >Dear Ming,
> > >
> > >Really thanks for reply,
> > >
> > >about thresholds and waitbound OK! I'll adjust them in adapter.c ,
> > >
> > >but what about enabling jumbo frames? should I do any thing special to
> > enable Jumbo fram support?
> > >
> > >we were thinking that it is enabled by default. Is it?
> > >
> > >thanks
> > >
> > >
> > >
> > >
> > >
> > > > From: [EMAIL PROTECTED]
> > > > To: [EMAIL PROTECTED]; [EMAIL PROTECTED];
> > linuxppc-embedded@ozlabs.org; [EMAIL PROTECTED]
> > > > Subject: RE: ML403 gigabit ethernet bandwidth - 2.6 kernel
> > > > Date: Sat, 23 Jun 2007 18:48:19 +
> > > >
> > > > Dear Mohammad,
> > > > There are some parameters which could be adjusted to improve the
> > > > performance. They are: TX and RX_Threshold TX and RX_waitbound. In 
my
> > > > system, we use TX_Threshold=16 and Rx_Threshold=8 and both 
waitbound=1.
> > > >
> > > > Also Jumbo frame of 8982 could be enable.
> > > >
> > > > Try those hints and share your improvement with us.
> > > >
> > > > BR
> > > > Ming
> > > >
> > > > >From: Mohammad Sadegh Sadri
> > > > >To: Andrei Konovalov , Linux PPC Linux
> > > > PPC, Grant Likely
> > > > >Subject: ML403 gigabit ethernet bandwidth - 2.6 kernel
> > > > >Date: Sat, 23 Jun 2007 12:19:12 +
> > > > >
> > > > >
> > > > >Dear all,
> > > > >
> > > > >Recently we did a set of tests on performance of virtex 4FX hard 
TEMAC
> > > > module using ML403
> > > > >
> > > > >we studied all of the posts here carefully: these are the system
> > > > characteristics;
> > > > >
> > > > >Board : ML403
> > > > >EDK: EDK9.1SP2
> > > > >Hard TEMAC version and PLTEMAC version are both 3.0.a
> > > > >PPC clock frequency :  300MHz
> > > > >Kernel : 2.6.21-rc7 , downloaded from grant's git tree some thing 
near
> > one
> > > > week ago
> > > > >DMA type: 3 (sg dma)
> > > > >DRE : enabled for TX and RX, (2)
> > > > >CSUM offload is enabled for both of TX and RX
> > > > >tx and rx fifo sizes : 131072 bits
> > > > >
> > > > >the board comes up over NFS root file system completely and 
without
> > any
> > > > problems.
> > > > >
> > > > >PC system used for these tests is : CPU P4 Dual Core, 3.4GHz ,
> > 2Gigabytes
> > > > memory, Dual gigabit ethernet port, running linux 2.6.21.3
> > > > >We have tested the PC system band width and it can easily reach
> > 966mbits/s
> > > > when connected to the same PC. ( using the same cross cable used 
for
> > ml403
> > > > test)
> > > > >
> > > > >Netperf is compiled with TCP SEND FILE enabled, ( -DHAVE_SENDFILE)
> > > > >
> > > > >(from board to PC)
> > > > >netperf -t TCP_SENDFILE -H 10.10.10.250 -F /boot/zImage.elf -- -m
> > 16384 -s
> > > > 87380 -S 87380
> > > > >
> > > > >the measured bandwidth for this test was just 40.66Mbits.
> > > > >It is also true for netperf from PC to board.
> > > > >
> > > > >we do not have any more idea about what we should do to improve 
the
> > > > bandwidth.
> > > > >any help or ideas is appreciated...
> > > > >
> > > > >_
> > > > >Connect to the next generation of MSN
> > > >
> > 
Messenger?>http://imagine-msn.com/messenger/launch80/default.aspx?locale=en-us&source=wlmailtagline

> >
> > > >
> > > > >___
> > > > >Linuxppc-embedded mailing list
> > > > >Linuxppc-embedded@ozlabs.org
> > > > >https://ozlabs.org/mailman/listinfo/linuxppc-embedded
> > > >
> > > > _
> > > > 免费下载 MSN Explorer:   http://explorer.msn.com/lccn/
> > > >
> > >
> > >___

Re: Mem-2-Mem DMA - Generalized API

2007-06-25 Thread Clifford Wolf
Hi,

On Sun, Jun 24, 2007 at 10:21:57PM +0200, Arnd Bergmann wrote:
> On Sunday 24 June 2007, Clifford Wolf wrote:
> > I'm working on an MPC8349E based project and as some of you might know this
> > chip has a four channel (bus-) memory-to-memory DMA controller.
> > 
> > Unfortunately the linux kernel is atm lacking a generic interface for such
> > DMA controllers.
> 
> So what's wrong with the include/linux/dmaengine.h API? I thought it was
> designed to cover this sort of DMA controller?

nothing. I was just to blind to find it..  ;-)

though there are some points:

on the first glimpse it seams like this api does not support scatter/gather
and fifo mode, right? in fact that's no problem at all for my project but
it would be a pity to lose that hardware functionality because of the api..

i have also had a quick look at the ioatdma driver and it apears to me that
it can only operate on address regions which are visible on the pci bus.
The MPC8349E dma can operate on everything which is visible on the coherent
local bus, i.e. everything that is also visible to the cpu. there seams to
be no way to specify the bus a dma channel is needed for when requesting a
channel thru this interface.

It also appears to me that the dmaengine.h API is not capable of
overcommiting. I.e. assigning a small pool of dma channels to a big pool of
drivers in the hope that not all of the drivers are doing dma transfers at
the same time (and schedule transfers if this assumtion turns out to be
wrong).

Wouldn't it be better to let the backend handle stuff like binding dma
channels to specific cpus and let the user just commit dma requests which
are then scheduled to the dma channel which fits the needs best (or done in
cpu if no dma channel exists which would be capable of doing this kind of
transfer)?

yours,
 - clifford

-- 
"The generation of random numbers is too important to be left to chance."
 - Robert R. Coveyou, Oak Ridge National Laboratory.
___
Linuxppc-embedded mailing list
Linuxppc-embedded@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-embedded