RE: Memory allocation modifications in ibm_newemac driver AND sil24 driver
Can anyone explain to me why I would be getting this error in the first place? Why is it failing to allocate a page when there are pages available? That does not make any sense to me. order:1 It's failing to allocate -two- pages. Good point. However, why is it failing? According to the dump, there are 28 8k pages available. From doing more testing yesterday, I found that this error is also coming up from the SATA driver as well (sil24). Again with those errors, it fails allocating a page of order 0, when there are a few hundred of those available. What is up with the VMM? Why are pages failing to allocate when they are available? Thanks, Jonathan ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
RE: Memory allocation modifications in ibm_newemac driver
I found out what was causing the crash, but still am not there and could use some direction: What was happening was that I was not allocating a new SKB to replace the one in the ring that was being passed up the stack. I have remedied that and am now having another issue: Once the ring index rolls over (it does so at 64) I start to lose packets because they are not being handled correctly (or do not contain the correct headers or something of that sort). Here is a simple ping test showing what is happening: 64 bytes from 172.31.22.21: seq=29 ttl=128 time=10.826 ms emac/plb/opb/ether...@ef600900: PACKET: 54 0X1800 110 0XC04E6B80 emac/plb/opb/ether...@ef600900: PACKET: 55 0X1800 110 0XC04E6EC0 emac/plb/opb/ether...@ef600900: PACKET: 56 0X1800 98 0XC04E70C0 64 bytes from 172.31.22.21: seq=30 ttl=128 time=10.839 ms emac/plb/opb/ether...@ef600900: PACKET: 57 0X1800 110 0XC04E6580 emac/plb/opb/ether...@ef600900: PACKET: 58 0X1800 98 0XC04E5B80 64 bytes from 172.31.22.21: seq=31 ttl=128 time=10.832 ms emac/plb/opb/ether...@ef600900: PACKET: 59 0X1800 219 0XC04E5740 emac/plb/opb/ether...@ef600900: PACKET: 60 0X1800 249 0XC04E5000 emac/plb/opb/ether...@ef600900: PACKET: 61 0X1800 92 0XC04E5340 emac/plb/opb/ether...@ef600900: PACKET: 62 0X1800 98 0XC04E4A00 64 bytes from 172.31.22.21: seq=32 ttl=128 time=10.825 ms emac/plb/opb/ether...@ef600900: PACKET: 63 0X5800 92 0XC04E4E00 emac/plb/opb/ether...@ef600900: PACKET: 0 0X1800 98 0XC04EF520 emac/plb/opb/ether...@ef600900: PACKET: 1 0X1800 92 0XC04D8340 emac/plb/opb/ether...@ef600900: PACKET: 2 0X1800 98 0XC04D8260 emac/plb/opb/ether...@ef600900: PACKET: 3 0X1800 60 0XC04E7AA0 emac/plb/opb/ether...@ef600900: PACKET: 4 0X1800 92 0XC04E74A0 emac/plb/opb/ether...@ef600900: PACKET: 5 0X1800 92 0XC04E6BC0 emac/plb/opb/ether...@ef600900: PACKET: 6 0X1800 98 0XC04E6F80 emac/plb/opb/ether...@ef600900: PACKET: 7 0X1800 92 0XC04E64E0 emac/plb/opb/ether...@ef600900: PACKET: 8 0X1800 98 0XC04E5BC0 The first number in my debug print statement is what the driver calls the slot number (the ring index). When it rolls over I start losing the ping replies. What have I done wrong to cause that? The data is coming in and is there. Have any other network device developers seen similar behavior? Thanks, Jonathan ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
RE: Memory allocation modifications in ibm_newemac driver
Okay, I think I have all the issues worked out and can now send and receive any size packet without a hiccup. I have tested this in our system setup as well with data being sent out to disk and did not see any problems there either (since it only ever allocates a single page, never more). Is this something that may be wanted in the mainline? I have not run full benchmarks, but I anticipate that my modified driver is slightly slower than the mainline driver because we keep track of an SKB ring, as well as a ring of pages and allocate both on each packet received. Thanks, Jonathan ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
RE: Memory allocation modifications in ibm_newemac driver
Apparently I spoke too soon - sorry about that. I am still getting the error when I try to write to disk and receive on the network at the same time. Here is the output: blastee: page allocation failure. order:1, mode:0x4020 Call Trace: [ccea9a40] [c0006ef0] show_stack+0x44/0x16c (unreliable) [ccea9a80] [c006f9f0] __alloc_pages_nodemask+0x38c/0x4f8 [ccea9b00] [c0095008] __slab_alloc+0x594/0x5e0 [ccea9b40] [c0095a08] __kmalloc_track_caller+0xe8/0xf0 [ccea9b60] [c01c848c] __alloc_skb+0x60/0x140 [ccea9b80] [c01a7df8] emac_poll_rx+0x568/0x768 [ccea9bc0] [c01a28e4] mal_poll+0xa8/0x1ec [ccea9bf0] [c01d3eec] net_rx_action+0x9c/0x1b4 [ccea9c20] [c003b3c0] __do_softirq+0xc4/0x148 [ccea9c60] [c0004d18] do_softirq+0x78/0x80 [ccea9c70] [c003b67c] local_bh_enable+0xc0/0xd8 [ccea9c80] [c01c29bc] lock_sock_nested+0xc0/0xdc [ccea9cc0] [c0212cb4] udp_recvmsg+0x318/0x3a4 [ccea9d10] [c01c2334] sock_common_recvmsg+0x3c/0x60 [ccea9d30] [c01c06c4] sock_recvmsg+0xb8/0xf0 [ccea9e20] [c01c09b0] sys_recvfrom+0x8c/0xfc [ccea9f00] [c01c18d4] sys_socketcall+0x128/0x1f8 [ccea9f40] [c000f434] ret_from_syscall+0x0/0x3c Mem-Info: DMA per-cpu: CPU0: hi: 90, btch: 15 usd: 48 Active_anon:28 active_file:807 inactive_anon:85 inactive_file:171 unevictable:0 dirty:0 writeback:0 unstable:0 free:506 slab:53530 mapped:362 pagetables:19 bounce:0 DMA free:2024kB min:2036kB low:2544kB high:3052kB active_anon:112kB inactive_anon:340kB active_file:3228kB inactive_file:684kB unevictable:0kB present:260096kB pages_scanned:0 all_unreclaimable? no lowmem_reserve[]: 0 0 0 DMA: 410*4kB 28*8kB 4*16kB 1*32kB 1*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 2024kB 978 total pagecache pages 0 pages in swap cache Swap cache stats: add 0, delete 0, find 0/0 Free swap = 0kB Total swap = 0kB 65536 pages RAM 1400 pages reserved 1001 pages shared 62828 pages non-shared SLUB: Unable to allocate memory on node -1 (gfp=0x20) cache: kmalloc-8192, object size: 8192, buffer size: 8192, default order: 3, min order: 1 node 0: slabs: 7140, objs: 25809, free: 0 Can anyone explain to me why I would be getting this error in the first place? Why is it failing to allocate a page when there are pages available? That does not make any sense to me. Thanks, Jonathan ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
RE: Disable Caching for mmap() address
On Mon, 2009-11-09 at 16:21 -0700, Jonathan Haws wrote: All, I would like to disable caching for an address that was returned from a call to mmap(). I am using this address for DMA operations in user space and want to make sure that the data cache is turned off for that buffer. The way this works is the driver simply takes an address I provide and begins a DMA operation to that location in RAM (I have ensured that this is a physical address I am passing already). When the DMA is complete, an interrupt fires and the ISR gives a semaphore that the user space application is pending on (RT_SEM from Xenomai). I have tried simply calling a cache invalidate routine in the ISR before I give the semaphore, but the kernel crashes when I try to call that routine - my guess it because the kernel does not have direct access to that location in memory (only my application does, according to the MMU). Anyway, all I want to do is make sure that the buffer is never stored in the cache and that I always fetch it from RAM. How can I specify that using mmap() on the /dev/mem device, or is there a better way to accomplish this? There is no proper way to do this, in large part because it's not always legal to map memory non-cached for various reasons I don't have time to explain right now... You may be able to get it working though but using a specific driver with an mmap function that tweaks the attributes or using mmap of /dev/mem after opening it with O_SYNC (off the top of my mind) But it's a bit fishy as the kernel has a cacheable mapping of most of memory and so you may end up with cache aliases... Thanks for the response, Ben. I am hoping that by passing a mem= argument to the kernel at boot time, the memory that I am setting aside for my DMA will be kept hidden from the kernel and the MMU. I am then mapping that memory in user space with mmap() on /dev/mem and that descriptor is being opened with the O_SYNC flag. I just wanted to make sure I was covering all my bases. Thanks, Jonathan ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Disable Caching for mmap() address
All, I would like to disable caching for an address that was returned from a call to mmap(). I am using this address for DMA operations in user space and want to make sure that the data cache is turned off for that buffer. The way this works is the driver simply takes an address I provide and begins a DMA operation to that location in RAM (I have ensured that this is a physical address I am passing already). When the DMA is complete, an interrupt fires and the ISR gives a semaphore that the user space application is pending on (RT_SEM from Xenomai). I have tried simply calling a cache invalidate routine in the ISR before I give the semaphore, but the kernel crashes when I try to call that routine - my guess it because the kernel does not have direct access to that location in memory (only my application does, according to the MMU). Anyway, all I want to do is make sure that the buffer is never stored in the cache and that I always fetch it from RAM. How can I specify that using mmap() on the /dev/mem device, or is there a better way to accomplish this? Thanks, Jonathan ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Invalidate Data Cache from User Space
All, I have a routine to invalidate the data cache from user space (since I do not believe there is a standard routine I can use outside of kernel space??). Here is the code: .text; .globl cacheInvalidate405; cacheInvalidate405: /* * r3 = Data cache * r4 = address * r5 = number of bytes */ cmpwi r5,0/* make sure number of bytes is 0 */ beq invalDone add r6,r4,r5 addir6,r6,31 rlwinm r6,r6,0,0,26/* end addr to start of next cache line */ rlwinm r7,r4,0,0,26/* start address back to start of line */ sub r6,r6,r7 srawi r6,r6,5 /* divide by 32 to get number of lines */ mtctr r6 invalLoop: dcbir0,r4 /* THIS INSTRUCTION FAILS! */ addir4,r4,32 bdnzinvalLoop sync invalDone: blr .size cacheInvalidate405, . - cacheInvalidate405 What is happening is the dcbi instruction will fail. I get an Illegal Instruction message on the console and my program exits. Is there a reason I cannot call dbci from a user space application, or is there something wrong in my code? Even better, is there a working and tested function that I can call from user space to invalidate a portion of the data cache? Thanks! Jonathan ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Interrupts not Firing on PPC405EX
All, I am having some troubles getting interrupts to fire from my kernel module. I have connected the ISR with a call to request_irq() and have configured my device to generate interrupts. However, my ISR is called once when I connect the interrupt for the first time. After that it never is called again. It seems like that interrupt is getting stuck disabled, but that does not make sense as to why. The device is on the PCIE0 bus and works just fine in another OS (namely Vxworks - that is the driver I am working on porting to Linux). Here is how I am connecting the ISR and the ISR itself. Am I doing something stupid? Thanks for the help! Jonathan PS - Our hardware is a custom spun PPC405EX board based on the AMCC Kilauea board and uses the kilauea.dts with no modifications. A quick note - I realize that I am not checking if I was the one to interrupt the CPU. I am not worried about that right now - especially since I know there is nothing else that will interrupt the CPU on this IRQ right now anyway - it never fires. int fpga_open(struct inode *inode, struct file *filp) { int err = 0; /* Make sure we have successfully probed the device */ if (NULL == fpga_drv.pcidev) { return -ENODEV; } /* Only one process at a time can have access to the FPGA */ if (0 != atomic_read(fpga_drv.openCount)) { atomic_inc(fpga_drv.openCount); printk(KERN_WARNING FPGA: Could not open device: already open. \n); return -EBUSY; } /* If not already in use, state that we are */ atomic_inc(fpga_drv.openCount); /* Store a pointer to the PCI device structure */ filp-private_data = fpga_drv.pcidev; /* Attach ISR to IRQ */ if (request_irq(fpga_drv.pcidev-irq, fpga_isr, IRQF_SHARED, FPGA_MODULE_NAME, fpga_drv.pcidev)) { printk( KERN_ERR FPGA: Unable to connect FPGA ISR (%d)!\n, fpga_drv.pcidev-irq); return -EPERM; } return 0; } /* Interrupt Service Routine */ irqreturn_t fpga_isr(int irq, void *dev_id) { uint32_t status = 0; status = fpga_drv.cfg_ptr[FPGA_INTER_STATUS]; printk(KERN_NOTICE FPGA: Interrupt fired! (%#08x)\n, status); if (status FPGA_INTERRUPT_SPI) { rt_sem_v(fpga_drv.sarSem); } /* Return HANDLED */ return (IRQ_HANDLED);} ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
RE: DMA to User-Space
Jonathan Haws wrote: All, I have what may be an unconventional question: Our application consists of data being captured by an FPGA, processed, and transferred to SDRAM. I simply give the FPGA an address of where I want it stored in SDRAM and it simply DMAs the data over and interrupts me when finished. I then take that data and store it to disk. I have code in user space that handles all of the writing to disk nicely and fast enough for my application (I am capturing data at about 35-40 Mbytes/sec). My question is this: is it possible to give a user-space pointer to the FPGA to DMA to? It seems like I would have problems with alignment, address manipulation, and a whole slew of other issues. What would be the best way to accomplish something like that? I want to handle all the disk access in user-space, but I do not want to have to copy 40 MB/s from kernel space to user-space either. You can maintain a DMA buffer in kernel, then mmap to user space. And maybe you need some handshake between FPGA and the apps to balance input datas with datas to disk. I can maintain an allocated, DMA-safe buffer in kernel space if needed. Can I simply get a user-space pointer to that buffer? What calls are needed to translate addresses? Use remap_pfn_range() in your kernel DMA buffer manipulation driver .mmap() handler to export DMA buffer address to user space. Can you provide an example for how to do that? I have an mmap routine to map BARs that the FPGA maintains and I can access those, however when I try to map the DMA buffer and access what is in it, the system crashes. Here is the mmap function I have right now: /* fpga_mmap() * * Description: * The purpose of this function is to serve as a 'file_operation' * which maps different PCI resources into the calling processes * memory space. * * NOTE: The file offset are in page size; i.e.: * offset 0 in process's mmap syscall - BAR0 * offset 4096 in process's mmap syscall - BAR1 * offset 8192 in process's mmap syscall - BAR2 * offset 12288 - Streaming DMA buffer * * Arguments: * struct file *filp -- struct representing an open file * struct vm_area_struct *vma -- struct representing memory 'segment' * * Returns: * int -- indication of success or failure * */ int fpga_mmap(struct file *filp, struct vm_area_struct *vma) { struct pci_dev *dev; unsigned long addressToMap; uint8_t mapType = 0; /* 0 = IO, 1 = memory */ /* Get the PCI device */ dev = (struct pci_dev*)(filp-private_data); /* Map in the appropriate BAR based on page offset */ if (vma-vm_pgoff == FPGA_CONFIG_SPACE) { /* Map BAR1 (the CONFIG area) */ printk(KERN_ALERT FPGA: Mapping BAR1 (CONFIG BAR).\n); addressToMap = pci_resource_start(dev, FPGA_CONFIG_SPACE); printk(KERN_ALERT FPGA: PCI BAR1 (CONFIG BAR) Size - %#08x.\n, pci_resource_len(dev, FPGA_CONFIG_SPACE)); mapType = 0; } else if(vma-vm_pgoff == FPGA_TEST_SPACE) { /* Map BAR2 (the TEST area) */ printk(KERN_ALERT FPGA: Mapping BAR2 (TEST BAR).\n); addressToMap = (pci_resource_start(dev, FPGA_TEST_SPACE) + pci_resource_len(dev, FPGA_TEST_SPACE)) - FPGA_TEST_LENGTH; printk(KERN_ALERT FPGA: PCI BAR2 (TEST BAR) Size - %#08x.\n, pci_resource_len(dev, FPGA_TEST_SPACE)); mapType = 0; } else if(vma-vm_pgoff == 3) { addressToMap = (unsigned long)fpga_drv.strmData[0]; mapType = 1; } else { printk(KERN_ALERT FPGA: Invalid BAR mapping specified.\n); return ERROR; } /* Execute the mapping */ vma-vm_flags |= VM_IO; vma-vm_flags |= VM_RESERVED; vma-vm_page_prot = pgprot_noncached(vma-vm_page_prot); printk(KERN_ALERT FPGA: vmSize - 0x%x.\n, (unsigned int)(vma-vm_end - vma-vm_start)); if( mapType == 0 ) { if(io_remap_pfn_range(vma, vma-vm_start, addressToMap PAGE_SHIFT, vma-vm_end - vma-vm_start, vma-vm_page_prot) != 0) { printk(KERN_ALERT FPGA: Failed to map BAR PCI space to user space.\n); return ERROR; } } else { printk(KERN_NOTICE FPGA: Mapping stream ptr (%#08x) to user space\n,(uint32_t)fpga_drv.strmData[0]); printk(KERN_NOTICE FPGA: Setting strmData[0][0] to 0x37\n); fpga_drv.strmData[0][0] = 0x37; if(remap_pfn_range(vma, vma-vm_start, addressToMap PAGE_SHIFT
RE: DMA to User-Space
1. I open /dev/mem and get a file descriptor 2. I use mmap to reserve some physical addresses for my buffers in user space. 3. I give that address to the FPGA for DMA use. 4. When I get the FPGA interrupt, I invalidate the data cache and write the data to disk Does that sound like it would work? Would the address I receive from mmap() and pass to the FPGA be the actual physical address, or would I need to send the physical address to the FPGA and use the mmap() address to access and write to disk? One more question about this approach: does the mmap() call prevent the kernel from using this memory for other purposes? Will the kernel be able to move this memory elsewhere? I guess what I am asking is if this memory is locked for all other purposes? ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
DMA to User-Space
All, I have what may be an unconventional question: Our application consists of data being captured by an FPGA, processed, and transferred to SDRAM. I simply give the FPGA an address of where I want it stored in SDRAM and it simply DMAs the data over and interrupts me when finished. I then take that data and store it to disk. I have code in user space that handles all of the writing to disk nicely and fast enough for my application (I am capturing data at about 35-40 Mbytes/sec). My question is this: is it possible to give a user-space pointer to the FPGA to DMA to? It seems like I would have problems with alignment, address manipulation, and a whole slew of other issues. What would be the best way to accomplish something like that? I want to handle all the disk access in user-space, but I do not want to have to copy 40 MB/s from kernel space to user-space either. I can maintain an allocated, DMA-safe buffer in kernel space if needed. Can I simply get a user-space pointer to that buffer? What calls are needed to translate addresses? Thanks for the help! I am still a newbie when it comes to kernel programming, so I really appreciate the help! Jonathan ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
RE: Accessing flash directly from User Space [SOLVED]
On Friday 30 October 2009 15:50:22 Jonathan Haws wrote: I suspect that the msync() was merely serving as a very heavyweight memory barrier. I did try hacking the mb() calls from the kernel source to use them in user space, but they had no effect. I still had to include the calls to msync(). What did the resulting mb() that you used look like? asm(eieio; sync); ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
RE: Accessing flash directly from User Space [SOLVED]
On Friday 30 October 2009 16:08:55 Alessandro Rubini wrote: asm(eieio; sync); Hmm... : : : memory And, doesn't ; start a comment in assembly? (no, not on powerpc it seems) Yes, I think the barrier is wrong. Please try with #define mb() __asm__ __volatile__(eieio\n sync\n : : : memory) That definition worked great. I must have missed the : : : memory bit when I was digging through code. Thanks, that gives me about a 2x speedup over the msync() calls. Thanks for the help! ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
RE: Accessing flash directly from User Space
On Tue, Oct 27, 2009 at 04:52:40PM -0600, Jonathan Haws wrote: Will the device respond to 0x1234 being written at offset zero? You generally have to poke these things pretty specifically in order to get them to go into command mode. It should because that is the first data location in flash. I don't follow. Even if you have an Intel command set flash (and thus don't need unlock writes), 0x1234 isn't a valid command that I know of. The flash doesn't behave as a register that you can read back; it just responds in a certain way based on what you write to it. Also, just to be sure I am telling the truth, I tried writing to one of the registers to setup an erase and got the same results - the value did not get written. Following the exact sequence that the driver uses? What did you write, what did you expect (you're generally not going to get the same thing back that you wrote), and what did you get? What kind of command set, bus width, and interleaving do you have? I used the erase pattern, then write pattern for my flash device. When I tried to read back the value that should have been stored, it was what it was previously. If you manually do the same exact accesses from a firmware prompt, external debugger, etc. does it work? The driver works perfectly in VxWorks, On this exact hardware? Yes. Including the 0x1234 thing? Actually, I have not tried that - I have not had to since the driver worked. What happens without the 0x1234? Have not bothered to try it. My guess, after finding out what the problem is that it would not read back 0x1234. In the test I performed, I intended to erase the sector, prep it for write, then write out 0x1234 to the first two bytes in flash. However, I failed in include the code to erase and prep the sector for writing in my rush to find out what the heck was going on. As I mentioned previously, I was just not allowing the correct sequence of operations to take place to erase the sector (that is where my problem began) because when I setup the sector for erasure, the sequencing did not take place correctly because what I would assign to flash was not committed immediately. I hope that makes sense. Thanks, Jonathan ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
RE: Accessing flash directly from User Space [SOLVED]
Does O_DIRECT help? (you may need to define _GNU_SOURCE before #include) Nope, O_DIRECT did not help - in fact it caused the application to crash. Why that is I am not sure, but it crashed. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
RE: Accessing flash directly from User Space [SOLVED]
Anyway, to make a long story short, I inserted an msync() after each assignment to the flash. This resolved my problem and I can now program my flash. Ouch, this was news to me too. Calling msync() after every write kills performance. We use mmap(/dev/mem) to access HW and havn't seen any issues yet. Is this perhaps a new behaviour for mmap(/dev/mem) and is there a way to avoid calling msync()? The address range should be outside the dram and thus uncached. Any write to any address in the range mmaped should go directly to the NOR flash. Any other behavior is a bug. It's not mapping an actual file here. That is what I was thinking. But I have a working driver and the extra delay of writing to flash caused by the msync() calls I can deal with. I only ever write data to flash when I need to update one of my boot images. Thanks, Jonathan ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
RE: Accessing flash directly from User Space
Looking through our notes and talking with the engineer who was performing the tests, it was exactly that - MTD was waiting for a signal that was produced differently than the hardware ready signal. By simply polling the flash until the hardware ready signal toggled we were able to get a much faster read and write speed. Granted, most of our signals are being sent through a CPLD, so that may be why MTD did not work as well. even if your problem is solved I'd like to understand this performance issue. I had a look into the datasheet of the S29GL Mirrorbit flash by Spansion as an example. They provide a dedicated pin RY/BY#, which signals the end of an embedded algorithm (erase or programming). While figure 11.9 shows no timing advance of RY/BY# against Dout on the data line, figure 11.12 has one of unspecified length between RY/BY# and the end of data toggling. If you had a 10-fold slowdown with MTD, either the CPLD really slows down the read access to the flash or maybe your custom driver uses some acceleration (write buffer programming, unlock bypass, accelerated program with 12V on the WP#/ACC pin) while MTD does not. Which kernel version and flash device did you use in this comparsion? We were using VxWorks when we did the comparison, so there may be a problem in their driver. We are using unlock bypass by the way. Our flash chip is from Spansion. I do not have the datasheet right with me, so I do not have the part number. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
RE: Accessing flash directly from User Space
On Tue, Oct 27, 2009 at 04:24:53PM -0600, Jonathan Haws wrote: How can I get that pointer? Unfortunately I cannot simply use the address of the flash. Is there some magical function call that gives me access to that portion of the memory space? $ man 2 mmap You want MAP_SHARED and O_SYNC. To use that I need to have a file descriptor to a device, do I not? However, I do not have a base flash driver to give me that file descriptor. Am I missing something with that call? /dev/mem Okay, I now have access to the flash memory, however when I write to it the writes do not take. I have tried calling msync() on the mapping to no avail. I have opened the fd with O_SYNC, but cannot get things to work right. Here are the calls: int fd = open(/dev/mem, O_SYNC | O_RDWR); uint16_t * flash = (uint16_t *)mmap(NULL, NOR_FLASH_SIZE, (PROT_READ | PROT_WRITE), MAP_PRIVATE, fd, NOR_FLASH_BASE_ADRS); What board and CPU are you using? Is your flash really at 0xFC80, or is that the virtual address that VxWorks puts it at? I am using a custom board based on the AMCC Kilauea development board. It uses a 405EX CPU. Yes, the flash is really at 0xFC00. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
RE: Accessing flash directly from User Space
On Tue, 2009-10-27 at 16:52 -0600, Jonathan Haws wrote: Jonathan Haws wrote: I had thought about using MTD, but decided against it because with previous benchmarking that we did with MTD and our custom driver, we found that our custom driver was about 10x faster. Ouch. Any idea where the slowdown is coming from? From what I remember (I would have to dig up notes to make sure) it is something to do with MTD looking for a signal to go high that is processed a bunch before MTD even sees it. Our flash produces a hardware ready signal that we are triggering off of to move on. MTD took much longer to report to us that the hardware was ready. Thanks It would be interesting to know in more detail what is was. If we have a 10x performance increase hiding from for us I would be very interested in knowing where it is. Are you using some custom command to the flash that the generic chip drivers in Linux is not yet supporting ? Looking through our notes and talking with the engineer who was performing the tests, it was exactly that - MTD was waiting for a signal that was produced differently than the hardware ready signal. By simply polling the flash until the hardware ready signal toggled we were able to get a much faster read and write speed. Granted, most of our signals are being sent through a CPLD, so that may be why MTD did not work as well. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
RE: Accessing flash directly from User Space [SOLVED]
On Tue, Oct 27, 2009 at 04:24:53PM -0600, Jonathan Haws wrote: How can I get that pointer? Unfortunately I cannot simply use the address of the flash. Is there some magical function call that gives me access to that portion of the memory space? $ man 2 mmap You want MAP_SHARED and O_SYNC. To use that I need to have a file descriptor to a device, do I not? However, I do not have a base flash driver to give me that file descriptor. Am I missing something with that call? /dev/mem Okay, I now have access to the flash memory, however when I write to it the writes do not take. I have tried calling msync() on the mapping to no avail. I have opened the fd with O_SYNC, but cannot get things to work right. Here are the calls: int fd = open(/dev/mem, O_SYNC | O_RDWR); uint16_t * flash = (uint16_t *)mmap(NULL, NOR_FLASH_SIZE, (PROT_READ | PROT_WRITE), MAP_PRIVATE, fd, NOR_FLASH_BASE_ADRS); What board and CPU are you using? Is your flash really at 0xFC80, or is that the virtual address that VxWorks puts it at? I am using a custom board based on the AMCC Kilauea development board. It uses a 405EX CPU. Yes, the flash is really at 0xFC00. I have found the problem. It occurred to me in the shower (okay not really, but most good ideas happen there). What was happening is that I was in fact able to write to the correct registers. However, I would try and write to them in a batch. But the way mmap works (at least according to the man page) with MAP_SHARED is that the file may not be updated until msync() is called. Now, I thought that O_SYNC would take care of that when I open /dev/mem, but that was not the case. Anyway, to make a long story short, I inserted an msync() after each assignment to the flash. This resolved my problem and I can now program my flash. Thanks everyone for your help! Jonathan ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
RE: Network Stack SKB Reallocation
try to reuse it later, the kernel would panic because that was not a valid SKB. So, moral of the story is keep your MTU at 4000 or lower. This hammers your throughput, but it seems to be the best we can do given the way the stack works. If anyone has any other solutions, that would be GREAT! I would love to be able to use a 9000 byte MTU without getting out of memory errors simply due to fragmentation. HTH, Jonathan -Original Message- From: linuxppc-dev-bounces+john.p.price=l-3com@lists.ozlabs.org [mailto:linuxppc-dev-bounces+john.p.price=l- 3com@lists.ozlabs.org] On Behalf Of Jonathan Haws Sent: Monday, October 26, 2009 2:43 PM To: linuxppc-dev@lists.ozlabs.org Subject: Network Stack SKB Reallocation Quick question about the network stack in general: Does the stack itself release an SKB allocated by the device driver back to the heap upstream, or does it require that the device driver handle that? Thanks! Jonathan ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Accessing flash directly from User Space
I know this is probably a really dumb question, but a wise man once said that the only stupid question is the one that is not asked. So, I have written a flash driver in VxWorks that simply addresses the flash directly and handles all the hardware accesses just fine. I am porting that to Linux and need it to run in user space (mainly to simplify the interface with the user - I want to keep it the same as in VxWorks). Here is a snippet of what my question is: static uint8_t bflashEraseSector(int sa, int verbose) { uint16_t * flash = (uint16_t *) NOR_FLASH_BASE_ADRS; uint32_t offset; ... /* We divide by 2 here to adjust for the 16-bit offset into the address */ offset = sa * NOR_FLASH_SECTOR_SIZE / 2; flash[BFLASH_SECTOR_ERASE_ADDR1] = BFLASH_SECTOR_ERASE_BYTE1; ... } I am trying to get a pointer to NOR_FLASH_BASE_ADRS which is defined to be 0xFC00. I then dereference that directly to write to the flash. How can I get that pointer? Unfortunately I cannot simply use the address of the flash. Is there some magical function call that gives me access to that portion of the memory space? Thanks for the help! Jonathan PS - I know that I could simply use the MTD driver provided by the kernel, but I need to be able to keep the interface the same so we can use previously written code. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
RE: Accessing flash directly from User Space
How can I get that pointer? Unfortunately I cannot simply use the address of the flash. Is there some magical function call that gives me access to that portion of the memory space? $ man 2 mmap You want MAP_SHARED and O_SYNC. To use that I need to have a file descriptor to a device, do I not? However, I do not have a base flash driver to give me that file descriptor. Am I missing something with that call? Thanks! ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
RE: Accessing flash directly from User Space
Jonathan Haws wrote: How can I get that pointer? Unfortunately I cannot simply use the address of the flash. Is there some magical function call that gives me access to that portion of the memory space? $ man 2 mmap You want MAP_SHARED and O_SYNC. To use that I need to have a file descriptor to a device, do I not? However, I do not have a base flash driver to give me that file descriptor. Am I missing something with that call? /dev/mem Ah, yes. I told you this was going to be a dumb question. Thanks, Bill. Jonathan ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
RE: Accessing flash directly from User Space
How can I get that pointer? Unfortunately I cannot simply use the address of the flash. Is there some magical function call that gives me access to that portion of the memory space? $ man 2 mmap You want MAP_SHARED and O_SYNC. To use that I need to have a file descriptor to a device, do I not? However, I do not have a base flash driver to give me that file descriptor. Am I missing something with that call? /dev/mem Okay, I now have access to the flash memory, however when I write to it the writes do not take. I have tried calling msync() on the mapping to no avail. I have opened the fd with O_SYNC, but cannot get things to work right. Here are the calls: int fd = open(/dev/mem, O_SYNC | O_RDWR); uint16_t * flash = (uint16_t *)mmap(NULL, NOR_FLASH_SIZE, (PROT_READ | PROT_WRITE), MAP_PRIVATE, fd, NOR_FLASH_BASE_ADRS); When I do flash[0] = 0x1234, and then check the value, they do not match. flash[0] = 0x1234; msync(flash, NOR_FLASH_SIZE, MS_SYNC | MS_INVALIDATE); printf(flash[0] = %#04x\n, flash[0]); That prints flash[0] = 0x7f45. I have verified that I am reading the correct values. I can display the flash contents in U-Boot and 7f45 is what is in the first 16 bits of flash. Why can I not write to flash? What am I doing wrong? Thanks! Jonathan ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
RE: Accessing flash directly from User Space
Okay, I now have access to the flash memory, however when I write to it the writes do not take. (PROT_READ | PROT_WRITE), MAP_PRIVATE, fd, MAP_SHARED. Bill told you. With MAP_PRIVATE you write to a local in-ram copy of the data, not to the original one. I apologize, that MAP_PRIVATE was leftover from me trying to get it to work. With MAP_SHARED I am having the problem. Sorry for the confusion. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
RE: Accessing flash directly from User Space
Jonathan Haws wrote: flash[0] = 0x1234; msync(flash, NOR_FLASH_SIZE, MS_SYNC | MS_INVALIDATE); printf(flash[0] = %#04x\n, flash[0]); That prints flash[0] = 0x7f45. I have verified that I am reading the correct values. I can display the flash contents in U-Boot and 7f45 is what is in the first 16 bits of flash. Why can I not write to flash? What am I doing wrong? Flash does not work that way -- you must send it commands to erase a block, and then further commands to program new data. I realize that. I have a driver written that does exactly that. However, I need to be able to write to certain registers to setup the erasure. Will the device respond to 0x1234 being written at offset zero? You generally have to poke these things pretty specifically in order to get them to go into command mode. It should because that is the first data location in flash. Also, just to be sure I am telling the truth, I tried writing to one of the registers to setup an erase and got the same results - the value did not get written. The driver works perfectly in VxWorks, Including the 0x1234 thing? Actually, I have not tried that - I have not had to since the driver worked. It sounds like what you really want is the /dev/mtd or /dev/mtdblock interface, not raw access to the flash chip. As mentioned in my initial post, I need to use my custom driver to maintain the interface to the application that uses the flash for data storage. I had thought about using MTD, but decided against it because with previous benchmarking that we did with MTD and our custom driver, we found that our custom driver was about 10x faster. Ouch. Any idea where the slowdown is coming from? From what I remember (I would have to dig up notes to make sure) it is something to do with MTD looking for a signal to go high that is processed a bunch before MTD even sees it. Our flash produces a hardware ready signal that we are triggering off of to move on. MTD took much longer to report to us that the hardware was ready. Thanks ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Jumbo Frame bug in ibm_newemac driver (was Jumbo Frames, sil24 SATA driver, and kswapd0 page allocation failures)
Okay, I need to revisit this issue. I have had my time taken away for other things the past couple of months, but I am now back at this network issue. Here is what I have done: 1. I modified the ibm_newemac driver to follow scatter-gather chains on the RX path. The idea was to setup the driver to only ever deal with single pages. The MAL in the PPC only supports data transfers of up to 4080 bytes (less than a single page), so it appears that the hardware should support single page chains. I set this up just like the e1000 driver. For whatever reason, this did not work. It is probably because I do not fully understand the Linux network stack yet (as is apparent in the next iteration). 2. I reverted to the original driver and found that, contrary to what I had thought earlier, the driver does allocate a ring of skbs for use in the driver. However, when a jumbo packet is received (larger than 4080 bytes) it uses the skb that was pre-allocated for the jumbo packet and allocates a new skb to replace the one in the ring. This is where the problem is - in that new allocation to replace the one in the stack. So, to remedy this, I pre-allocated the same number of jumbo skbs for the sole purpose of being used as new skbs for the rx ring. Here is some code that shows the idea: Statuc int emaC_open(struct net_device *ndev) { ... /* Allocate RX ring */ for (i = 0; i NUM_RX_BUFF; ++i) { if (emac_alloc_rx_skb(dev, i, GFP_KERNEL)) { printk(KERN_ERR %s: failed to allocate RX ring\n, ndev-name); goto oom; } } ... } static inline int emac_alloc_rx_skb2(struct emac_instance *dev, int slot, gfp_t flags) { struct sk_buff *skb = dev-rx_skb_pool[slot]; if (unlikely(!skb)) return -ENOMEM; if(skb_recycle_check(skb, emac_rx_skb_size(dev-rx_skb_size))) { dev-rx_skb[slot] = skb; dev-rx_desc[slot].data_len = 0; skb_reserve(skb, EMAC_RX_SKB_HEADROOM + 2); dev-rx_desc[slot].data_ptr = dma_map_single(dev-ofdev-dev, skb-data - 2, dev-rx_sync_size, DMA_FROM_DEVICE) + 2; wmb(); dev-rx_desc[slot].ctrl = MAL_RX_CTRL_EMPTY | (slot == (NUM_RX_BUFF - 1) ? MAL_RX_CTRL_WRAP : 0); return 0; } else { printk(KERN_NOTICE EMAC: SKB not recycleable\n); return -ENOMEM; } } Static int emac_poll_rx(void *param, int budget) { ... sg: if (ctrl MAL_RX_CTRL_FIRST) { BUG_ON(dev-rx_sg_skb); if (unlikely(emac_alloc_rx_skb2(dev, slot, GFP_ATOMIC))) { DBG(dev, rx OOM %d NL, slot); ++dev-estats.rx_dropped_oom; emac_recycle_rx_skb(dev, slot, 0); } else { dev-rx_sg_skb = skb; emac_recycle_rx_skb(dev,slot,len); skb_put(skb, len); } } else if (!emac_rx_sg_append(dev, slot) (ctrl MAL_RX_CTRL_LAST)) { skb = dev-rx_sg_skb; dev-rx_sg_skb = NULL; ctrl = EMAC_BAD_RX_MASK; if (unlikely(ctrl ctrl != EMAC_RX_TAH_BAD_CSUM)) { emac_parse_rx_error(dev, ctrl); ++dev-estats.rx_dropped_error; dev_kfree_skb(skb); len = 0; } else { /* printk(KERN_NOTICE EMAC: pushing sg packet\n);*/ goto push_packet; } } goto skip; ... } The changes are the allocation of the rx_skb_pool in emac_open(), the function call emac_alloc_rx_skb2() in emac_poll_rx(), and the modifications to emac_alloc_skb to create emac_alloc_rx_skb2. Also, corresponding allocations for rx_skb_pool are found in emac_resize_rx_ring() for when we need to resize the pool. Now the problem that I am having is this - the first time through the ring, things work just fine. But the second time through the loop, the buffers are not cleaned out - they still think they contain data. I have tried calling skb_recycle_check() to restore the skb to a new state, however that call fails because apparently the skb cannot be reused for receive. Why is that the case? What am I missing? It seems like I am missing something that allows the skb to be reused? I will admit, I am not a Linux network driver expert, though I am learning. If
Network Stack SKB Reallocation
Quick question about the network stack in general: Does the stack itself release an SKB allocated by the device driver back to the heap upstream, or does it require that the device driver handle that? Thanks! Jonathan ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
RE: Network Stack SKB Reallocation
So, in my case, I allocate a bunch of skb's that I want to be able to reuse during network operation (256 in fact). When I pass it up the stack, the stack will free that skb back to the system making any further use of it invalid until I call alloc_skb() again? Thanks. On Monday 26 October 2009 19:43:00 Jonathan Haws wrote: Quick question about the network stack in general: Does the stack itself release an SKB allocated by the device driver back to the heap upstream, or does it require that the device driver handle that? There's the concept of passing responsibilities for the frames between the networking layers. So the driver passes the frame and all responsibilities to the networking stack. So if the networking stack accepts the packet in the first place, it needs to free it (or pass it to somebody else to take care of). -- Greetings, Michael. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
RE: Page map BUG on program exit
Jake, That is exactly what I needed. Patch 34113 worked like a charm. Thanks for the help! Jonathan Here ya go Jonathan, http://patchwork.ozlabs.org/patch/34047/ http://patchwork.ozlabs.org/patch/34113/ Both patches work for my situation, but I went with the second set as a final patch(34113). - Jake Magee On Thu, Oct 22, 2009 at 3:57 PM, Jonathan Haws jonathan.h...@sdl.usu.edumailto:jonathan.h...@sdl.usu.edu wrote: All, I am using a 405EX CPU on a custom board. The layout and hardware is very similar to the AMCC Kilauea board. Here is the output of uname -a: Linux (none) 2.6.30.3-wolverine-dirty #3 PREEMPT Thu Sep 10 11:41:37 MDT 2009 ppc unknown I am getting the following BUG output when my program exits: BUG: Bad page map in process main pte:980005d7 pmd:0d840400 addr:4800 vm_flags:400844fb anon_vma:(null) mapping:cd8454f8 index:98000 vma-vm_file-f_op-mmap: fpga_mmap+0x0/0x178 [fpgaDriver] Call Trace: [cd84dc40] [c0006f0c] show_stack+0x44/0x16c (unreliable) [cd84dc80] [c00ba314] print_bad_pte+0x140/0x1d0 [cd84dcb0] [c00ba3ec] vm_normal_page+0x48/0x50 [cd84dcc0] [c00bb2ec] unmap_vmas+0x214/0x614 [cd84dd40] [c00bffe0] exit_mmap+0xd0/0x1b4 [cd84dd70] [c0031e40] mmput+0x50/0x134 [cd84dd80] [c0036470] exit_mm+0x114/0x13c [cd84ddb0] [c0037d80] do_exit+0xc0/0x68c [cd84de00] [c0038390] do_group_exit+0x44/0xd8 [cd84de10] [c0044468] get_signal_to_deliver+0x1f8/0x430 [cd84de70] [c0008224] do_signal+0x54/0x29c [cd84df40] [c0010d5c] do_user_signal+0x74/0xc4 I have an FPGA on the PCIe bus that I am mapping BAR0 to user space with a call to mmap(). The mapping works just fine and I can access all the registers in the BAR without a problem. However, on exit this comes up. A Google search showed tons of people with similar problems in standard distributions (Ubuntu primarily), but no resolutions. Has anyone seen this crop up before and know what the issue is? I include any source code, if that is required. Thanks! Jonathan ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.orgmailto:Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Page map BUG on program exit
All, I am using a 405EX CPU on a custom board. The layout and hardware is very similar to the AMCC Kilauea board. Here is the output of uname -a: Linux (none) 2.6.30.3-wolverine-dirty #3 PREEMPT Thu Sep 10 11:41:37 MDT 2009 ppc unknown I am getting the following BUG output when my program exits: BUG: Bad page map in process main pte:980005d7 pmd:0d840400 addr:4800 vm_flags:400844fb anon_vma:(null) mapping:cd8454f8 index:98000 vma-vm_file-f_op-mmap: fpga_mmap+0x0/0x178 [fpgaDriver] Call Trace: [cd84dc40] [c0006f0c] show_stack+0x44/0x16c (unreliable) [cd84dc80] [c00ba314] print_bad_pte+0x140/0x1d0 [cd84dcb0] [c00ba3ec] vm_normal_page+0x48/0x50 [cd84dcc0] [c00bb2ec] unmap_vmas+0x214/0x614 [cd84dd40] [c00bffe0] exit_mmap+0xd0/0x1b4 [cd84dd70] [c0031e40] mmput+0x50/0x134 [cd84dd80] [c0036470] exit_mm+0x114/0x13c [cd84ddb0] [c0037d80] do_exit+0xc0/0x68c [cd84de00] [c0038390] do_group_exit+0x44/0xd8 [cd84de10] [c0044468] get_signal_to_deliver+0x1f8/0x430 [cd84de70] [c0008224] do_signal+0x54/0x29c [cd84df40] [c0010d5c] do_user_signal+0x74/0xc4 I have an FPGA on the PCIe bus that I am mapping BAR0 to user space with a call to mmap(). The mapping works just fine and I can access all the registers in the BAR without a problem. However, on exit this comes up. A Google search showed tons of people with similar problems in standard distributions (Ubuntu primarily), but no resolutions. Has anyone seen this crop up before and know what the issue is? I include any source code, if that is required. Thanks! Jonathan ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
RE: Jumbo Frames, sil24 SATA driver, and kswapd0 page allocation failures
If the hardware supports it, the best way to deal with it is to set up the driver so that it only ever deals in single pages. I am working on fixing the driver to support NETIF_F_SG and have changed how it receives packets to follow how the e1000 driver does it. Here is where I am at: When I get the first part of the frame, I allocate an skb for the packet. I call dev-page = alloc_page(GFP_ATOMIC) to allocate a page for the 4080 bytes coming from the MAL. I then setup a DMA mapping for that page to get the data out of the MAL (the original code simply used dma_map_single, but I need a page). Once the DMA map has been setup and data transferred, I call skb_fill_page_desc() to put the data into the skb. I then wrote a function called emac_consume_page, which unmaps the DMA mapping, frees the page, and updates the lengths in the skb. The relevant source code is at the end of this email. My problem is this: When I run this code, it appears to create the fragmented packet just fine, but when it passes it up the stack, the kernel spits out these bugs, one after another: BUG: Bad page state in process swapper pfn:0ee9b page:c051f360 flags:(null) count:-3 mapcount:0 mapping:(null) index:766 Call Trace: [c032bc30] [c0006ef0] show_stack+0x44/0x16c (unreliable) [c032bc70] [c006c438] bad_page+0x94/0x130 [c032bc90] [c006d4a0] get_page_from_freelist+0x458/0x4d4 [c032bd20] [c006d5f4] __alloc_pages_nodemask+0xd8/0x4f8 [c032bda0] [c01a1174] emac_poll_rx+0x300/0x9c8 [c032bdf0] [c019cb64] mal_poll+0xa8/0x1ec [c032be20] [c01cf218] net_rx_action+0x9c/0x1b4 [c032be50] [c0039678] __do_softirq+0xc4/0x148 [c032be90] [c0004d18] do_softirq+0x78/0x80 [c032bea0] [c0039264] irq_exit+0x64/0x7c [c032beb0] [c0005210] do_IRQ+0x9c/0xb4 [c032bed0] [c000fa7c] ret_from_except+0x0/0x18 [c032bf90] [c000808c] cpu_idle+0xdc/0xec [c032bfb0] [c00028fc] rest_init+0x70/0x84 [c032bfc0] [c02e0864] start_kernel+0x240/0x2c4 [c032bff0] [c0002254] start_here+0x44/0xb0 BUG: Bad page state in process swapper pfn:0ee8c page:c051f180 flags:(null) count:-3 mapcount:0 mapping:(null) index:757 Call Trace: [c032bc30] [c0006ef0] show_stack+0x44/0x16c (unreliable) [c032bc70] [c006c438] bad_page+0x94/0x130 [c032bc90] [c006d4a0] get_page_from_freelist+0x458/0x4d4 [c032bd20] [c006d5f4] __alloc_pages_nodemask+0xd8/0x4f8 [c032bda0] [c01a1174] emac_poll_rx+0x300/0x9c8 [c032bdf0] [c019cb64] mal_poll+0xa8/0x1ec [c032be20] [c01cf218] net_rx_action+0x9c/0x1b4 [c032be50] [c0039678] __do_softirq+0xc4/0x148 [c032be90] [c0004d18] do_softirq+0x78/0x80 [c032bea0] [c0039264] irq_exit+0x64/0x7c [c032beb0] [c0005210] do_IRQ+0x9c/0xb4 [c032bed0] [c000fa7c] ret_from_except+0x0/0x18 [c032bf90] [c000808c] cpu_idle+0xdc/0xec [c032bfb0] [c00028fc] rest_init+0x70/0x84 [c032bfc0] [c02e0864] start_kernel+0x240/0x2c4 [c032bff0] [c0002254] start_here+0x44/0xb0 I know that I am missing something when it comes to allocating the pages for the fragments, but when I compare my methodology to the e1000 driver, they appear to be functionally the same? Any ideas? I can send the entire source file for the driver if needs be. Thanks! Jonathan Here is the source: static int emac_poll_rx(void *param, int budget) { ... /* Other code is here */ push_packet: skb-dev = dev-ndev; skb-protocol = eth_type_trans(skb, dev-ndev); emac_rx_csum(dev, skb, ctrl); if (unlikely(netif_receive_skb(skb) == NET_RX_DROP)) ++dev-estats.rx_dropped_stack; next: ++dev-stats.rx_packets; skip: dev-stats.rx_bytes += len; slot = (slot + 1) % NUM_RX_BUFF; --budget; ++received; continue; sg: if (ctrl MAL_RX_CTRL_FIRST) { BUG_ON(dev-rx_sg_skb); if (unlikely(emac_alloc_rx_skb2(dev, slot, GFP_ATOMIC))) { DBG(dev, rx OOM %d (%d) (%d) NL, slot, dev-rx_skb_size, len); ++dev-estats.rx_dropped_oom; emac_recycle_rx_skb(dev, slot, 0); } else { dev-rx_sg_skb = skb; skb_fill_page_desc(dev-rx_sg_skb, 0, dev-page, 0, len); emac_consume_page(dev, len, slot); dev-rx_sg_skb-len += ETH_HLEN; } } else if (!emac_rx_sg_append(dev, slot) (ctrl MAL_RX_CTRL_LAST)) { skb = dev-rx_sg_skb; dev-rx_sg_skb = NULL; ctrl = EMAC_BAD_RX_MASK; if (unlikely(ctrl ctrl != EMAC_RX_TAH_BAD_CSUM)) { emac_parse_rx_error(dev, ctrl); ++dev-estats.rx_dropped_error; dev_kfree_skb(skb); len = 0; } else goto push_packet; } ... /* Other code is here */ } /* end of emac_poll_rx */ static inline int emac_alloc_rx_skb2(struct emac_instance *dev, int slot, gfp_t flags) { struct sk_buff *skb = alloc_skb(242, flags); if (unlikely(!skb)) return -ENOMEM; dev-rx_skb[slot] = skb; dev-rx_desc[slot].data_len = 0;
Jumbo Frames, sil24 SATA driver, and kswapd0 page allocation failures
All, I am having some issues with my target and was hoping that someone could lend a hand. I am using an AMCC 405EX (Kilauea) board running Linux kernel 2.6.31. Here is the problem. I have some code that receives jumbo frames via the EMAC, sticks the data in a buffer, and writes the data out to a solid-state SATA disk (using a Silicon Image 3531 controller). What is happening is that I appear to be running out of memory and I cannot figure out why. The closest thing I can tell is that the sil24 driver for the SATA controller does not seem to be releasing memory back to the kernel for some reason. After some time of capturing data and logging it to disk, I get the following kernel dump: kswapd0: page allocation failure. order:2, mode:0x4020 Call Trace: [cfaa19a0] [c0006ef0] show_stack+0x44/0x16c (unreliable) [cfaa19e0] [c006f5e4] __alloc_pages_nodemask+0x38c/0x4f8 [cfaa1a60] [c006f770] __get_free_pages+0x20/0x50 [cfaa1a70] [c00955d4] __kmalloc_track_caller+0xcc/0xf0 [cfaa1a90] [c01c437c] __alloc_skb+0x60/0x140 [cfaa1ab0] [c01a319c] emac_poll_rx+0x46c/0x7e4 [cfaa1af0] [c019e85c] mal_poll+0xa8/0x1ec [cfaa1b20] [c01cfddc] net_rx_action+0x9c/0x1b4 [cfaa1b50] [c003b3a8] __do_softirq+0xc4/0x148 [cfaa1b90] [c0004d18] do_softirq+0x78/0x80 [cfaa1ba0] [c003af94] irq_exit+0x64/0x7c [cfaa1bb0] [c0005210] do_IRQ+0x9c/0xb4 [cfaa1bd0] [c000fa7c] ret_from_except+0x0/0x18 [cfaa1c90] [c0094dc4] kmem_cache_free+0x74/0xcc [cfaa1cb0] [c00c0570] free_buffer_head+0x38/0x84 [cfaa1cc0] [c00c0b8c] try_to_free_buffers+0x94/0xe0 [cfaa1cf0] [c0067e70] try_to_release_page+0x6c/0x84 [cfaa1d00] [c0075f58] shrink_page_list+0x648/0x818 [cfaa1de0] [c0076620] shrink_zone+0x4f8/0xac4 [cfaa1f00] [c0077294] kswapd+0x4a0/0x4bc [cfaa1fc0] [c004d6d8] kthread+0x70/0x74 [cfaa1ff0] [c000f220] kernel_thread+0x4c/0x68 Mem-Info: DMA per-cpu: CPU0: hi: 90, btch: 15 usd: 54 Active_anon:5155 active_file:626 inactive_anon:5216 inactive_file:42474 unevictable:0 dirty:176 writeback:0 unstable:0 free:631 slab:6416 mapped:324 pagetables:32 bounce:0 DMA free:2524kB min:2036kB low:2544kB high:3052kB active_anon:20620kB inactive_anon:20864kB active_file:2504kB inactive_file:169896kB unevictable:0kB present:260096kB pages_scanned:64 all_unreclaimable? no lowmem_reserve[]: 0 0 0 DMA: 345*4kB 119*8kB 0*16kB 0*32kB 1*64kB 1*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 2524kB 43129 total pagecache pages 0 pages in swap cache Swap cache stats: add 0, delete 0, find 0/0 Free swap = 0kB Total swap = 0kB 65536 pages RAM 1397 pages reserved 43434 pages shared 20347 pages non-shared I am not sure what is causing this. It only happens when I run both the network and the SATA disk at the same time. If I only capture data on the EMAC, things work just fine (I ran the system overnight, capturing data at 36Mbytes/s without even a hiccup). If I only write data to disk, things seem to work fine. But when I combine the two, then things go crazy. Here is the loop: for(;;) { if( datalength + 9000 16*1024*1024 ) { write(fd, (char*)rxBuf[count][0], dataLength); fsync(fd); wrBytes += dataLength; dataLength = 0; count = (count+1)%RXCNT; } bytes = recvfrom(sock.socket,(char*)rxBuf[count][dataLength], MTUSIZE, (int)NULL, NULL, NULL); rxBytes += bytes; dataLength += bytes; sched_yield(); } /* for(;;) */ A pretty simple loop to receive the data, place it into a buffer, and write it to disk when ready. What is it about the write call that would not release memory? Any ideas? Has anyone seen this type of behavior before? Thanks! Jonathan ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Jumbo Frames, sil24 SATA driver, and kswapd0 page allocation failures
All, I am having some issues with my target and was hoping that someone could lend a hand. I am using an AMCC 405EX (Kilauea) board running Linux kernel 2.6.31. Here is the problem. I have some code that receives jumbo frames via the EMAC, sticks the data in a buffer, and writes the data out to a solid-state SATA disk (using a Silicon Image 3531 controller). What is happening is that I appear to be running out of memory and I cannot figure out why. The closest thing I can tell is that the sil24 driver for the SATA controller does not seem to be releasing memory back to the kernel for some reason. After some time of capturing data and logging it to disk, I get the following kernel dump: kswapd0: page allocation failure. order:2, mode:0x4020 Call Trace: [cfaa19a0] [c0006ef0] show_stack+0x44/0x16c (unreliable) [cfaa19e0] [c006f5e4] __alloc_pages_nodemask+0x38c/0x4f8 [cfaa1a60] [c006f770] __get_free_pages+0x20/0x50 [cfaa1a70] [c00955d4] __kmalloc_track_caller+0xcc/0xf0 [cfaa1a90] [c01c437c] __alloc_skb+0x60/0x140 [cfaa1ab0] [c01a319c] emac_poll_rx+0x46c/0x7e4 [cfaa1af0] [c019e85c] mal_poll+0xa8/0x1ec [cfaa1b20] [c01cfddc] net_rx_action+0x9c/0x1b4 [cfaa1b50] [c003b3a8] __do_softirq+0xc4/0x148 [cfaa1b90] [c0004d18] do_softirq+0x78/0x80 [cfaa1ba0] [c003af94] irq_exit+0x64/0x7c [cfaa1bb0] [c0005210] do_IRQ+0x9c/0xb4 [cfaa1bd0] [c000fa7c] ret_from_except+0x0/0x18 [cfaa1c90] [c0094dc4] kmem_cache_free+0x74/0xcc [cfaa1cb0] [c00c0570] free_buffer_head+0x38/0x84 [cfaa1cc0] [c00c0b8c] try_to_free_buffers+0x94/0xe0 [cfaa1cf0] [c0067e70] try_to_release_page+0x6c/0x84 [cfaa1d00] [c0075f58] shrink_page_list+0x648/0x818 [cfaa1de0] [c0076620] shrink_zone+0x4f8/0xac4 [cfaa1f00] [c0077294] kswapd+0x4a0/0x4bc [cfaa1fc0] [c004d6d8] kthread+0x70/0x74 [cfaa1ff0] [c000f220] kernel_thread+0x4c/0x68 Mem-Info: DMA per-cpu: CPU0: hi: 90, btch: 15 usd: 54 Active_anon:5155 active_file:626 inactive_anon:5216 inactive_file:42474 unevictable:0 dirty:176 writeback:0 unstable:0 free:631 slab:6416 mapped:324 pagetables:32 bounce:0 DMA free:2524kB min:2036kB low:2544kB high:3052kB active_anon:20620kB inactive_anon:20864kB active_file:2504kB inactive_file:169896kB unevictable:0kB present:260096kB pages_scanned:64 all_unreclaimable? no lowmem_reserve[]: 0 0 0 DMA: 345*4kB 119*8kB 0*16kB 0*32kB 1*64kB 1*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 2524kB 43129 total pagecache pages 0 pages in swap cache Swap cache stats: add 0, delete 0, find 0/0 Free swap = 0kB Total swap = 0kB 65536 pages RAM 1397 pages reserved 43434 pages shared 20347 pages non-shared I am not sure what is causing this. It only happens when I run both the network and the SATA disk at the same time. If I only capture data on the EMAC, things work just fine (I ran the system overnight, capturing data at 36Mbytes/s without even a hiccup). If I only write data to disk, things seem to work fine. But when I combine the two, then things go crazy. Here is the loop: for(;;) { if( datalength + 9000 16*1024*1024 ) { write(fd, (char*)rxBuf[count][0], dataLength); fsync(fd); wrBytes += dataLength; dataLength = 0; count = (count+1)%RXCNT; } bytes = recvfrom(sock.socket,(char*)rxBuf[count][dataLength], MTUSIZE, (int)NULL, NULL, NULL); rxBytes += bytes; dataLength += bytes; sched_yield(); } /* for(;;) */ A pretty simple loop to receive the data, place it into a buffer, and write it to disk when ready. What is it about the write call that would not release memory? Any ideas? Has anyone seen this type of behavior before? Thanks! Jonathan ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev