RE: ML403 gigabit ethernet bandwidth - 2.6 kernel
I was able to achieve ~320 Mbit/sec data rate using the Gigabit System Reference Design (GSRD XAPP535/536) from Xilinx. This utilizes the LocalLink TEMAC to perform the transfers. The reference design provides the Linux 2.4 drivers that can be ported to Linux 2.6 with a little effort. This implementation did not use checksum offloading and the data rates were achieved using TCP_STREAM on netperf. Greg ___ Linuxppc-embedded mailing list Linuxppc-embedded@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-embedded
RE: Follow up on 4 Gig of DDR on MPC8548E
> apparently after more investigations - it looks like there is something in > the ext2 driver code > that is mal-adjustedI haven't talked to the guy today who was looking at > that - but the ext2 > driver code that was openning a 'virtual file' / console - had some problems > mapping that > space - again, my gut is telling me more stronger there is a problem with > signed/unsigned... > now deeper in the ext2 code... I very much doubt there is a signed/unsigned issue in ext2 or elsewhere... You have CONFIG_HIGHMEM ? What are your setting for KERNELBASE and PAGE_OFFSET ? Ben. ___ Linuxppc-embedded mailing list Linuxppc-embedded@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-embedded
RE: Follow up on 4 Gig of DDR on MPC8548E
>> 0x_ to 0x_ or do you have 64GB of mem... 4 GIG DDR Memory...that we say...there is 3 GIG for the moment... the LAW's are setup in priority order - thus, if you have overlapping regions (aka: PCI & DDR memory) the one with the lower #'d LAW is the one it is mapped to >> If you have all physical mem in the first 32bit, where are your PCI >> windows set? >> And in modst cases the PCI devices (if they are bus-mastring ) need 1-1 >> inbound mapping to be able to write to memory. you can use the LAW's to remap anything to anywhere... see above for priority - but in our case, we are mapping the PCI/PEX/Local Bus to the last 4 GIG at a higher priority LAW than the DDR...that is what the MPC8548 spec says how it works...and it does - if you don't tell the kernel there is more than 2GIG (and ioremap that last GIG to access the full 'other' 2 GIG... that is not the issue here apparently after more investigations - it looks like there is something in the ext2 driver code that is mal-adjustedI haven't talked to the guy today who was looking at that - but the ext2 driver code that was openning a 'virtual file' / console - had some problems mapping that space - again, my gut is telling me more stronger there is a problem with signed/unsigned... now deeper in the ext2 code... Am at the Freescale Technical Forum the next few days - if any of you guys are down here in Florida...I am intending to track down a few freescale folk on this issue...:-) Tom ___ Linuxppc-embedded mailing list Linuxppc-embedded@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-embedded
Re: Mem-2-Mem DMA - Generalized API
Clifford Wolf wrote: > Hi, > > However, i don't think that implementing stuff like memset() in a dma > controller is any good because that would just flood the memory bus which > would then block in 99% of all cases the cpu until the dma is finished. > > It would however cost less power than doing that in the CPU. ;-) At least while the DMA transfer is happening, you could preempt to some other task. Would it flood the memory bus? When a DMA transfer happens would it really do it in such a way that it would stall the CPU on a memory access far more than it would usually? I think it would have to be extremely badly designed to be even ABLE to do that, or at least, you'd be doing some incredibly unwise things to be able to flood it like that. >> Indeed. I wonder if we could pull apart the IOAT/DMA stuff and genericise >> it (it should be possible) or simply add to it, or if making a powerpc >> specific dma engine abstraction would be an easier idea. > > I don't think that this would actually be powerpc specific in any way. But > since such general purpose dma controllers are more common in embedded > hardware this still seams to be the right place to discuss the issue. I meant powerpc platform (as in ARCH=powerpc) specific. Rather than dropping it in the global drivers, just keep it as a library for powerpc. Everyone else can get it later with a move into the full tree. As long as the headers have common, easy to get to names that do not conflict with anything preexisting, it would not affect anything. Taking IOAT as an example and fixing it's weirdness would be a better start than making a whole new API, but I think doing development *WITH* IOAT and potentially trashing all.. umm.. 4 users, and the weird Linux kernel tripling of development cost when heavily updating an actively maintained subsystem that everyone else wants to touch, that would be detrimental. We don't want to break Intel and we don't want to be tracking Intel's patches or having extra weirdness break in (or for the number of users of that DMA system to explode underneath New DMA Development) -- Matt Sealey <[EMAIL PROTECTED]> Genesi, Manager, Developer Relations ___ Linuxppc-embedded mailing list Linuxppc-embedded@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-embedded
Katmai w/ DENX git - "init has generated signal 4" error
I'm using the Denx linux-2.6-denx.git repository with a Katmai board. I want to boot off a disk (initialized w/ Debian 4.0). I've installed a Promise Ultra133 Tx2 IDE controller card in the PCI slot, and configured it in the kernel.I boot this same disk w/ IDE card on other 4xx boards without a problem (Bamboo w/ Denx 4.1 for example). I am getting an error "init has generated signal 4 but has no handler for it". ## Booting image at 0020 ... Image Name: Linux-2.6.22-rc5-gc8144983-dirty Image Type: PowerPC Linux Kernel Image (gzip compressed) Data Size:1328285 Bytes = 1.3 MB Load Address: Entry Point: Verifying Checksum ... OK Uncompressing Kernel Image ... OK Linux version 2.6.22-rc5-gc8144983-dirty ([EMAIL PROTECTED]) (gcc ve rsion 4.0.0 (DENX ELDK 4.1 4.0.0)) #4 Mon Jun 25 13:59:54 EDT 2007 AMCC PowerPC 440SPe Katmai Platform Zone PFN ranges: DMA 0 -> 131072 Normal 131072 -> 131072 early_node_map[1] active PFN ranges 0:0 -> 131072 Built 1 zonelists. Total pages: 130048 Kernel command line: console=ttyS0,115200 root=/dev/hde3 rw ip=dhcp PID hash table entries: 2048 (order: 11, 8192 bytes) Dentry cache hash table entries: 65536 (order: 6, 262144 bytes) Inode-cache hash table entries: 32768 (order: 5, 131072 bytes) Memory: 516864k available (2008k kernel code, 676k data, 160k init, 0k highmem) Mount-cache hash table entries: 512 NET: Registered protocol family 16 PCI: Probing PCI hardware SCSI subsystem initialized NET: Registered protocol family 2 IP route cache hash table entries: 16384 (order: 4, 65536 bytes) TCP established hash table entries: 65536 (order: 7, 524288 bytes) TCP bind hash table entries: 65536 (order: 6, 262144 bytes) TCP: Hash tables configured (established 65536 bind 65536) TCP reno registered io scheduler noop registered io scheduler anticipatory registered (default) io scheduler deadline registered io scheduler cfq registered Generic RTC Driver v1.07 Serial: 8250/16550 driver $Revision: 1.90 $ 4 ports, IRQ sharing enabled serial8250: ttyS0 at MMIO 0x0 (irq = 0) is a 16550A serial8250: ttyS1 at MMIO 0x0 (irq = 1) is a 16550A serial8250: ttyS2 at MMIO 0x0 (irq = 37) is a 16550A RAMDISK driver initialized: 16 RAM disks of 4096K size 1024 blocksize xsysace xsysace.0: Xilinx SystemACE revision 1.0.12 xsysace xsysace.0: capacity: 256512 sectors xsa: xsa1 Intel(R) PRO/1000 Network Driver - version 7.3.20-k2 Copyright (c) 1999-2006 Intel Corporation. PPC 4xx OCP EMAC driver, version 3.54 mal0: initialized, 1 TX channels, 1 RX channels eth0: emac0, MAC 00:01:73:77:55:27 eth0: found Generic MII PHY (0x01) e100: Intel(R) PRO/100 Network Driver, 3.5.17-k4-NAPI e100: Copyright(c) 1999-2006 Intel Corporation Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2 ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx PDC20269: IDE controller at PCI slot :00:01.0 PDC20269: chipset revision 2 PDC20269: ROM enabled at 0x000dc000 PDC20269: PLL input clock is 33309 kHz PDC20269: 100% native mode on irq 52 ide2: BM-DMA at 0xffd0-0xffd7, BIOS settings: hde:pio, hdf:pio ide3: BM-DMA at 0xffd8-0xffdf, BIOS settings: hdg:pio, hdh:pio hde: FUJITSU MHT2040AH, ATA DISK drive hde: host side 80-wire cable detection failed, limiting max speed to UDMA33 ide2 at 0xfff8-0x,0xfff6 on irq 52 hde: max request size: 128KiB hde: 78140160 sectors (40007 MB) w/8192KiB Cache, CHS=65535/16/63, UDMA(33) hde: cache flushes supported hde: hde1 hde2 hde3 i2c /dev entries driver IBM IIC driver v2.1 ibm-iic0: using standard (100 kHz) mode ds1307 1-0068: rtc core: registered ds1307 as rtc0 ibm-iic1: using standard (100 kHz) mode TCP cubic registered NET: Registered protocol family 1 NET: Registered protocol family 17 ds1307 1-0068: setting the system clock to 2000-04-30 08:13:25 (957082405) eth0: link is up, 100 FDX Sending DHCP requests ., OK IP-Config: Got DHCP answer from 255.255.255.255, my address is 9.27.218.226 IP-Config: Complete: device=eth0, addr=9.27.218.226, mask=255.255.255.128, gw=9.27.218.129, host=9.27.218.226, domain=raleigh.ibm.com, nis-domain=(none), bootserver=255.255.255.255, rootserver=255.255.255.255, rootpath= kjournald starting. Commit interval 5 seconds EXT3 FS on hde3, internal journal EXT3-fs: mounted filesystem with ordered data mode. VFS: Mounted root (ext3 filesystem). Freeing unused kernel memory: 160k init init has generated signal 4 but has no handler for it Kernel panic - not syncing: Attempted to kill init! Rebooting in 180 seconds.. Not sure if I'm missing something obvious - any help appreciated. Thanks, Steve ___ Linuxppc-embedded mailing list Linuxppc-embedded@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-embedded
Re: Mem-2-Mem DMA - Generalized API
Hi, On Mon, Jun 25, 2007 at 03:31:59PM +0100, Matt Sealey wrote: > > The main question remains: Is it possible to have a flexible cross platform > > DMA API which handles even complex requests and does scheduling, > > prioritizing, queuing, locking, (re-)building/caching of SG lists... > > automagically. > > I would think so. I think there is a fairly generic example in many parts > of the Linux kernel. Dare I say the Via Unichrome AGP subsystem? And a > bunch of the ARM/OMAP platforms..? A lot of the code is even identical, > I wonder why it isn't some library rather than platform drivers. I've put a 'draft header file' of an api as I would have expected it online: http://www.clifford.at/priv/dmatransfer.h I'd love to hear your feedback on it. One issue I'm absolutely not sure about atm are the different busses and their address spaces. The design in the header file is working directly on 'bus addresses' (the thing accessable thru /dev/mem). Does anyone know a case where this may be insufficient? > > Filling memory with zero is also a simple task for a DMA engine. > > (Thinking about malloc() and memset()) > > Also xor and logical operations, byte swapping huge chunks of data, that > kind of thing. Most DMA engines in SoCs have cute features like that. I > think BestComm can even calculate CRCs for IP packets. I havent added it yet but such things could be encoded using the DMATRANSFER_CHUNK_* and DMATRANSFER_* flags. However, i don't think that implementing stuff like memset() in a dma controller is any good because that would just flood the memory bus which would then block in 99% of all cases the cpu until the dma is finished. It would however cost less power than doing that in the CPU. ;-) > Indeed. I wonder if we could pull apart the IOAT/DMA stuff and genericise > it (it should be possible) or simply add to it, or if making a powerpc > specific dma engine abstraction would be an easier idea. I don't think that this would actually be powerpc specific in any way. But since such general purpose dma controllers are more common in embedded hardware this still seams to be the right place to discuss the issue. yours, - clifford -- Relax, its only ONES and ZEROS! ___ Linuxppc-embedded mailing list Linuxppc-embedded@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-embedded
Re: Mem-2-Mem DMA - Generalized API
Hi, On Mon, Jun 25, 2007 at 12:00:03PM -0500, Olof Johansson wrote: > That's the case with the dma engine framework today. Expand it, extend > it, fix it and improve it. Don't duplicate, re-invent and re-implement. I'm not sure if I can agree with that. The core idea befind the dma engine framework seams to be to statically assign dma channels to device drivers. I think that the channels should be dynamically assigned. Imo writing an alternative implementation is much easier than hacking that into the existing framework. Especially since the existing framework has only one backend driver and only one user. yours, - clifford -- /* You are not expected to understand this */ ___ Linuxppc-embedded mailing list Linuxppc-embedded@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-embedded
RE: Follow up on 4 Gig of DDR on MPC8548E
> From: Morrison, Tom > Sent: Friday, June 22, 2007 12:50 PM > Setup: >1) 4 Gig of DDR RAM (at physical addresses 0-0xF__) 0x_ to 0x_ or do you have 64GB of mem... > >> EXT2-fs warning: mounting unchecked fs, running e2fsck is > recommended > >> VFS: Mounted root (ext2 filesystem). > >> 316k init > >> EXT2-fs error (device sda1): ext2_check_page: bad entry in > directory > #2: >> unaligned directory entry - offset=0, inode=128, > rec_len=8961,name_len=69 > >> Warning: unable to open an initial console. > My gut tells me this might be something to do with the 2 > Gig boundary and specifically a "signed" versus "unsigned" > address/offsets mismatch maybe somewhere in the file system?? If you have all physical mem in the first 32bit, where are your PCI windows set? And in modst cases the PCI devices (if they are bus-mastring ) need 1-1 inbound mapping to be able to write to memory. ___ Linuxppc-embedded mailing list Linuxppc-embedded@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-embedded
Re: Mem-2-Mem DMA - Generalized API
On Mon, Jun 25, 2007 at 03:31:59PM +0100, Matt Sealey wrote: > > Indeed. I wonder if we could pull apart the IOAT/DMA stuff and genericise > it (it should be possible) or simply add to it, or if making a powerpc > specific dma engine abstraction would be an easier idea. It's hard to anticipate all possible uses of a framework when you first write it. So, when you first write it with one device in mind, it's fairly obvious that it might not fit well with the second device that will use it. That's the case of drivers/dma and IOAT today. That's the case with the dma engine framework today. Expand it, extend it, fix it and improve it. Don't duplicate, re-invent and re-implement. -Olof ___ Linuxppc-embedded mailing list Linuxppc-embedded@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-embedded
RE: ML403 gigabit ethernet bandwidth - 2.6 kernel
All, I am also very interested in the network throughput. I am using the Avenet Mini-Module which has a V4FX12. The ML403 is very close to the Mini-Module. I am getting a throughput of about 100 Mbps. The biggest difference was turning on the cache. 100 MHz vs. 300 MHz only improved the performance slightly. Using the checksum offloading was also a big help in getting the throughput up. The RX Threshhold also helped, but the jumbo frames did not seem to help. I am not sure what I can do to get the 300 Mbps Ming is getting. I saw on a previous post someone was using 128k FIFO depth. I am using a 32k depth. Glenn (Embedded "Ming Liu" <[EMAIL PROTECTED]>@ozlabs.org image moved 06/25/2007 06:03 AM to file: pic11478.jpg) Sent by: [EMAIL PROTECTED] To:[EMAIL PROTECTED] cc:[EMAIL PROTECTED], linuxppc-embedded@ozlabs.org Subject:RE: ML403 gigabit ethernet bandwidth - 2.6 kernel Security Level:? Internal Dear Mohammad, >The results are as follows: >PC-->ML403 >TCP_SENDFILE : 38Mbps > >ML403--->PC >TCP_SENDFILE: 155Mbps This result is unreasonable. Because PC is more powerful than your board, so PC->board should be faster than board->PC. >The transfer rate from ML403 to PC has improved by a factor of 2, >I see on the posts here in the mailing list that you have reached a band width of 301Mbps. Yes, with all features which could improve performance enabled, we can get around 300Mbps for TCP transfer. one more hint, did you enable caches on your system? perhaps it will help. Anyway, double check your hardware design to make sure all features are enabled.That's all I can suggest. BR Ming > > > > > > > From: [EMAIL PROTECTED] > > To: [EMAIL PROTECTED]; [EMAIL PROTECTED]; linuxppc-embedded@ozlabs.org; [EMAIL PROTECTED] > > Subject: RE: ML403 gigabit ethernet bandwidth - 2.6 kernel > > Date: Sat, 23 Jun 2007 19:10:16 + > > > > Use the following command in Linux please: > > > > ifconfig eth0 mtu 8982 > > > > As well you should do that on your PC in the measurement. > > > > Ming > > > > > > >From: Mohammad Sadegh Sadri > > >To: Ming Liu , > > ,, > > > > >Subject: RE: ML403 gigabit ethernet bandwidth - 2.6 kernel > > >Date: Sat, 23 Jun 2007 19:08:29 + > > > > > > > > >Dear Ming, > > > > > >Really thanks for reply, > > > > > >about thresholds and waitbound OK! I'll adjust them in adapter.c , > > > > > >but what about enabling jumbo frames? should I do any thing special to > > enable Jumbo fram support? > > > > > >we were thinking that it is enabled by default. Is it? > > > > > >thanks > > > > > > > > > > > > > > > > > > > From: [EMAIL PROTECTED] > > > > To: [EMAIL PROTECTED]; [EMAIL PROTECTED]; > > linuxppc-embedded@ozlabs.org; [EMAIL PROTECTED] > > > > Subject: RE: ML403 gigabit ethernet bandwidth - 2.6 kernel > > > > Date: Sat, 23 Jun 2007 18:48:19 + > > > > > > > > Dear Mohammad, > > > > There are some parameters which could be adjusted to improve the > > > > performance. They are: TX and RX_Threshold TX and RX_waitbound. In my > > > > system, we use TX_Threshold=16 and Rx_Threshold=8 and both waitbound=1. > > > > > > > > Also Jumbo frame of 8982 could be enable. > > > > > > > > Try those hints and share your improvement with us. > > > > > > > > BR > > > > Ming > > > > > > > > >From: Mohammad Sadegh Sadri > > > > >To: Andrei Konovalov , Linux PPC Linux > > > > PPC, Grant Likely > > > > >Subject: ML403 gigabit ethernet bandwidth - 2.6 kernel > > > > >Date: Sat, 23 Jun 2007 12:19:12 + > > > > > > > > > > > > > > >Dear all, > > > > > > > > > >Recently we did a set of tests on performance of virtex 4FX hard TEMAC > > > > module using ML403 > > > > > > > > > >we studied all of the posts here carefully: these are the system > > > > characteristics; > > > > > > > > > >Board : ML403 > > > > >EDK: EDK9.1SP2 > > > > >Hard TEMAC version and PLTEMAC version are both 3.0.a > > > > >PPC clock frequency : 300MHz > > > > >Kernel : 2.6.21-rc7 , downloaded from grant's git tree some thing near > > one > > > > week ago > > > > >DMA type: 3 (sg dma) > > > > >DRE : enabled for TX and RX, (2) > > > > >CSUM offload is enabled for both of TX and RX > > > > >tx and rx fifo sizes : 131072 bits > > > > > > > > > >the board comes up over NFS root file system completely and without > > any >
UART for MPC5200 with DTR, DSR, RI and DCD?
Hi, Does anybody know of an UART driver for the MPC5200 that supports use of the DTR, DSR, RI and DCD signals? Regards, Mattias ___ Linuxppc-embedded mailing list Linuxppc-embedded@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-embedded
Re: Mem-2-Mem DMA - Generalized API
Clemens Koller wrote: > Hello, Matt! > >> There is so much you can do with most SoC DMA controllers, and it's not >> even limited to PowerPC (most ARM/XScale SoCs have very capable devices >> inside too). I can only imagine that nobody got excited over IOAT because >> the entire programming interface stinks of "offloading gigabit ethernet" >> and not much else. > > The main question remains: Is it possible to have a flexible cross platform > DMA API which handles even complex requests and does scheduling, > prioritizing, queuing, locking, (re-)building/caching of SG lists... > automagically. I would think so. I think there is a fairly generic example in many parts of the Linux kernel. Dare I say the Via Unichrome AGP subsystem? And a bunch of the ARM/OMAP platforms..? A lot of the code is even identical, I wonder why it isn't some library rather than platform drivers. > Filling memory with zero is also a simple task for a DMA engine. > (Thinking about malloc() and memset()) Also xor and logical operations, byte swapping huge chunks of data, that kind of thing. Most DMA engines in SoCs have cute features like that. I think BestComm can even calculate CRCs for IP packets. > The problem is IMHO similar to video acceleration. Within the > Xorg's XAA/EXA/whatever framework, the drivers accelerate certain > calls if the hardware has the capability to do so. Other calls fall back > to some default non accelerated memcpy() & friends. > > Sounds like a lot of fun... replacing kernel's and libc's memcpy() with > memcpy_with_dma_if_possible(). :-) Indeed. I wonder if we could pull apart the IOAT/DMA stuff and genericise it (it should be possible) or simply add to it, or if making a powerpc specific dma engine abstraction would be an easier idea. Probably the latter to be merged with the former at a later date would be easier to manage. Take inspiration but don't be bound by Intel's weird "new" (i.e. 15 year old) concept? -- Matt Sealey <[EMAIL PROTECTED]> Genesi, Manager, Developer Relations ___ Linuxppc-embedded mailing list Linuxppc-embedded@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-embedded
Re: Mem-2-Mem DMA - Generalized API
Hello, Matt! Matt Sealey schrieb: > IOAT and Intel's DMA engine driver is very IOAT specific in places.. > > I had a peek at it as I have a little interest in the concept; at least the > two platforms Genesi has been supporting (Pegasos and Efika) have quite > competant DMA engines which are woefully underused (i.e. not at all). True. > There exists a Marvell DMA driver somewhere (I have a copy, someone on > this list posted it about a year ago) and while the MPC5200B doesn't have > explicit support for DMA from memory to memory (although memory to SRAM > might work in chunks, or memory to a FIFO wired as a loopback like in > the docs..??) > > There is so much you can do with most SoC DMA controllers, and it's not > even limited to PowerPC (most ARM/XScale SoCs have very capable devices > inside too). I can only imagine that nobody got excited over IOAT because > the entire programming interface stinks of "offloading gigabit ethernet" > and not much else. The main question remains: Is it possible to have a flexible cross platform DMA API which handles even complex requests and does scheduling, prioritizing, queuing, locking, (re-)building/caching of SG lists... automagically. It could fall back to CPU's memcpy if the hardware doesn't have the ability to use the DMA machine because all channels are already busy, or the requested memory isn't DMAable or the request is just too small that it doesn't make sense to setup a DMA channel. Filling memory with zero is also a simple task for a DMA engine. (Thinking about malloc() and memset()) The problem is IMHO similar to video acceleration. Within the Xorg's XAA/EXA/whatever framework, the drivers accelerate certain calls if the hardware has the capability to do so. Other calls fall back to some default non accelerated memcpy() & friends. Sounds like a lot of fun... replacing kernel's and libc's memcpy() with memcpy_with_dma_if_possible(). :-) Best regards, -- Clemens Koller __ R&D Imaging Devices Anagramm GmbH Rupert-Mayer-Straße 45/1 Linhof Werksgelände D-81379 München Tel.089-741518-50 Fax 089-741518-19 http://www.anagramm-technology.com ___ Linuxppc-embedded mailing list Linuxppc-embedded@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-embedded
Re: ML403 gigabit ethernet bandwidth - 2.6 kernel
Hi, We need to findout where is the bottlenect. 1. Run vmstat on the ML403 board and find out the percentage CPU is busy when you are transferring the file. That will show if cpu is busy or not. 2. Run oprofile and find out which are the routines eating away the cpu time. Once we have data from both the above routines, we can find out the bottlenecks. Regards Bhupi On 6/23/07, Mohammad Sadegh Sadri <[EMAIL PROTECTED]> wrote: Dear all, Recently we did a set of tests on performance of virtex 4FX hard TEMAC module using ML403 we studied all of the posts here carefully: these are the system characteristics; Board : ML403 EDK: EDK9.1SP2 Hard TEMAC version and PLTEMAC version are both 3.0.a PPC clock frequency : 300MHz Kernel : 2.6.21-rc7 , downloaded from grant's git tree some thing near one week ago DMA type: 3 (sg dma) DRE : enabled for TX and RX, (2) CSUM offload is enabled for both of TX and RX tx and rx fifo sizes : 131072 bits the board comes up over NFS root file system completely and without any problems. PC system used for these tests is : CPU P4 Dual Core, 3.4GHz , 2Gigabytes memory, Dual gigabit ethernet port, running linux 2.6.21.3 We have tested the PC system band width and it can easily reach 966mbits/s when connected to the same PC. ( using the same cross cable used for ml403 test) Netperf is compiled with TCP SEND FILE enabled, ( -DHAVE_SENDFILE) (from board to PC) netperf -t TCP_SENDFILE -H 10.10.10.250 -F /boot/zImage.elf -- -m 16384 -s 87380 -S 87380 the measured bandwidth for this test was just 40.66Mbits. It is also true for netperf from PC to board. we do not have any more idea about what we should do to improve the bandwidth. any help or ideas is appreciated... _ Connect to the next generation of MSN Messenger http://imagine-msn.com/messenger/launch80/default.aspx?locale=en-us&source=wlmailtagline ___ Linuxppc-embedded mailing list Linuxppc-embedded@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-embedded ___ Linuxppc-embedded mailing list Linuxppc-embedded@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-embedded
Re: Mem-2-Mem DMA - Generalized API
IOAT and Intel's DMA engine driver is very IOAT specific in places.. I had a peek at it as I have a little interest in the concept; at least the two platforms Genesi has been supporting (Pegasos and Efika) have quite competant DMA engines which are woefully underused (i.e. not at all). There exists a Marvell DMA driver somewhere (I have a copy, someone on this list posted it about a year ago) and while the MPC5200B doesn't have explicit support for DMA from memory to memory (although memory to SRAM might work in chunks, or memory to a FIFO wired as a loopback like in the docs..??) There is so much you can do with most SoC DMA controllers, and it's not even limited to PowerPC (most ARM/XScale SoCs have very capable devices inside too). I can only imagine that nobody got excited over IOAT because the entire programming interface stinks of "offloading gigabit ethernet" and not much else. -- Matt Sealey <[EMAIL PROTECTED]> Genesi, Manager, Developer Relations Arnd Bergmann wrote: > On Sunday 24 June 2007, Clifford Wolf wrote: >> I'm working on an MPC8349E based project and as some of you might know this >> chip has a four channel (bus-) memory-to-memory DMA controller. >> >> Unfortunately the linux kernel is atm lacking a generic interface for such >> DMA controllers. > > So what's wrong with the include/linux/dmaengine.h API? I thought it was > designed to cover this sort of DMA controller? > > Arnd <>< > ___ > Linuxppc-embedded mailing list > Linuxppc-embedded@ozlabs.org > https://ozlabs.org/mailman/listinfo/linuxppc-embedded ___ Linuxppc-embedded mailing list Linuxppc-embedded@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-embedded
RE: ML403 gigabit ethernet bandwidth - 2.6 kernel
Dear Mohammad, >The results are as follows: >PC-->ML403 >TCP_SENDFILE : 38Mbps > >ML403--->PC >TCP_SENDFILE: 155Mbps This result is unreasonable. Because PC is more powerful than your board, so PC->board should be faster than board->PC. >The transfer rate from ML403 to PC has improved by a factor of 2, >I see on the posts here in the mailing list that you have reached a band width of 301Mbps. Yes, with all features which could improve performance enabled, we can get around 300Mbps for TCP transfer. one more hint, did you enable caches on your system? perhaps it will help. Anyway, double check your hardware design to make sure all features are enabled.That's all I can suggest. BR Ming > > > > > > > From: [EMAIL PROTECTED] > > To: [EMAIL PROTECTED]; [EMAIL PROTECTED]; linuxppc-embedded@ozlabs.org; [EMAIL PROTECTED] > > Subject: RE: ML403 gigabit ethernet bandwidth - 2.6 kernel > > Date: Sat, 23 Jun 2007 19:10:16 + > > > > Use the following command in Linux please: > > > > ifconfig eth0 mtu 8982 > > > > As well you should do that on your PC in the measurement. > > > > Ming > > > > > > >From: Mohammad Sadegh Sadri > > >To: Ming Liu , > > ,, > > > > >Subject: RE: ML403 gigabit ethernet bandwidth - 2.6 kernel > > >Date: Sat, 23 Jun 2007 19:08:29 + > > > > > > > > >Dear Ming, > > > > > >Really thanks for reply, > > > > > >about thresholds and waitbound OK! I'll adjust them in adapter.c , > > > > > >but what about enabling jumbo frames? should I do any thing special to > > enable Jumbo fram support? > > > > > >we were thinking that it is enabled by default. Is it? > > > > > >thanks > > > > > > > > > > > > > > > > > > > From: [EMAIL PROTECTED] > > > > To: [EMAIL PROTECTED]; [EMAIL PROTECTED]; > > linuxppc-embedded@ozlabs.org; [EMAIL PROTECTED] > > > > Subject: RE: ML403 gigabit ethernet bandwidth - 2.6 kernel > > > > Date: Sat, 23 Jun 2007 18:48:19 + > > > > > > > > Dear Mohammad, > > > > There are some parameters which could be adjusted to improve the > > > > performance. They are: TX and RX_Threshold TX and RX_waitbound. In my > > > > system, we use TX_Threshold=16 and Rx_Threshold=8 and both waitbound=1. > > > > > > > > Also Jumbo frame of 8982 could be enable. > > > > > > > > Try those hints and share your improvement with us. > > > > > > > > BR > > > > Ming > > > > > > > > >From: Mohammad Sadegh Sadri > > > > >To: Andrei Konovalov , Linux PPC Linux > > > > PPC, Grant Likely > > > > >Subject: ML403 gigabit ethernet bandwidth - 2.6 kernel > > > > >Date: Sat, 23 Jun 2007 12:19:12 + > > > > > > > > > > > > > > >Dear all, > > > > > > > > > >Recently we did a set of tests on performance of virtex 4FX hard TEMAC > > > > module using ML403 > > > > > > > > > >we studied all of the posts here carefully: these are the system > > > > characteristics; > > > > > > > > > >Board : ML403 > > > > >EDK: EDK9.1SP2 > > > > >Hard TEMAC version and PLTEMAC version are both 3.0.a > > > > >PPC clock frequency : 300MHz > > > > >Kernel : 2.6.21-rc7 , downloaded from grant's git tree some thing near > > one > > > > week ago > > > > >DMA type: 3 (sg dma) > > > > >DRE : enabled for TX and RX, (2) > > > > >CSUM offload is enabled for both of TX and RX > > > > >tx and rx fifo sizes : 131072 bits > > > > > > > > > >the board comes up over NFS root file system completely and without > > any > > > > problems. > > > > > > > > > >PC system used for these tests is : CPU P4 Dual Core, 3.4GHz , > > 2Gigabytes > > > > memory, Dual gigabit ethernet port, running linux 2.6.21.3 > > > > >We have tested the PC system band width and it can easily reach > > 966mbits/s > > > > when connected to the same PC. ( using the same cross cable used for > > ml403 > > > > test) > > > > > > > > > >Netperf is compiled with TCP SEND FILE enabled, ( -DHAVE_SENDFILE) > > > > > > > > > >(from board to PC) > > > > >netperf -t TCP_SENDFILE -H 10.10.10.250 -F /boot/zImage.elf -- -m > > 16384 -s > > > > 87380 -S 87380 > > > > > > > > > >the measured bandwidth for this test was just 40.66Mbits. > > > > >It is also true for netperf from PC to board. > > > > > > > > > >we do not have any more idea about what we should do to improve the > > > > bandwidth. > > > > >any help or ideas is appreciated... > > > > > > > > > >_ > > > > >Connect to the next generation of MSN > > > > > > Messenger?>http://imagine-msn.com/messenger/launch80/default.aspx?locale=en-us&source=wlmailtagline > > > > > > > > > > >___ > > > > >Linuxppc-embedded mailing list > > > > >Linuxppc-embedded@ozlabs.org > > > > >https://ozlabs.org/mailman/listinfo/linuxppc-embedded > > > > > > > > _ > > > > 免费下载 MSN Explorer: http://explorer.msn.com/lccn/ > > > > > > > > > >___
Re: Mem-2-Mem DMA - Generalized API
Hi, On Sun, Jun 24, 2007 at 10:21:57PM +0200, Arnd Bergmann wrote: > On Sunday 24 June 2007, Clifford Wolf wrote: > > I'm working on an MPC8349E based project and as some of you might know this > > chip has a four channel (bus-) memory-to-memory DMA controller. > > > > Unfortunately the linux kernel is atm lacking a generic interface for such > > DMA controllers. > > So what's wrong with the include/linux/dmaengine.h API? I thought it was > designed to cover this sort of DMA controller? nothing. I was just to blind to find it.. ;-) though there are some points: on the first glimpse it seams like this api does not support scatter/gather and fifo mode, right? in fact that's no problem at all for my project but it would be a pity to lose that hardware functionality because of the api.. i have also had a quick look at the ioatdma driver and it apears to me that it can only operate on address regions which are visible on the pci bus. The MPC8349E dma can operate on everything which is visible on the coherent local bus, i.e. everything that is also visible to the cpu. there seams to be no way to specify the bus a dma channel is needed for when requesting a channel thru this interface. It also appears to me that the dmaengine.h API is not capable of overcommiting. I.e. assigning a small pool of dma channels to a big pool of drivers in the hope that not all of the drivers are doing dma transfers at the same time (and schedule transfers if this assumtion turns out to be wrong). Wouldn't it be better to let the backend handle stuff like binding dma channels to specific cpus and let the user just commit dma requests which are then scheduled to the dma channel which fits the needs best (or done in cpu if no dma channel exists which would be capable of doing this kind of transfer)? yours, - clifford -- "The generation of random numbers is too important to be left to chance." - Robert R. Coveyou, Oak Ridge National Laboratory. ___ Linuxppc-embedded mailing list Linuxppc-embedded@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-embedded