Re: [PATCH 21/21]: powerpc/cell spidernet DMA coalescing
Linas Vepstas wrote: The current driver code performs 512 DMA mappns of a bunch of 32-byte structures. This is silly, as they are all in contiguous memory. Ths patch changes the code to DMA map the entie area with just one call. Signed-off-by: Linas Vepstas <[EMAIL PROTECTED]> Cc: James K Lewis <[EMAIL PROTECTED]> Cc: Arnd Bergmann <[EMAIL PROTECTED]> While this patch wasn't useful in the current cell implementation of pci_map_single it sounds like people are going to be making changes to that sometime. In light of that new information (new to me anyway) this should probably go in after all. Sorry for causing trouble. Acked-by: Joel Schopp <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 21/21]: powerpc/cell spidernet DMA coalescing
> I started writingthe patch thinking it will have some huge effect on > performance, based on a false assumption on how i/o was done on this > machine > > *If* this were another pSeries system, then each call to > pci_map_single() chews up an actual hardware "translation > control entry" (TCE) that maps pci bus addresses into > system RAM addresses. These are somewhat limited resources, > and so one shouldn't squander them. Furthermore, I thouhght > TCE's have TLB's associated with them (similar to how virtual > memory page tables are backed by hardware page TLB's), of which > there are even less of. I was thinking that TLB thrashing would > have a big hit on performance. > > Turns out that there was no difference to performance at all, > and a quick look at "cell_map_single()" in arch/powerpc/platforms/cell > made it clear why: there's no fancy i/o address mapping. > > Thus, the patch has only mrginal benefit; I submit it only in the > name of "its the right thing to do anyway". Well, there is no fancy iommu mapping ... yet. It's been implemented and is coming after we put together some workarounds for various other spider hardware issues that trigger when using it (bogus prefetches and bogus pci ordering). I think the hypervisor based platforms will be happy with that patch too. Ben. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 21/21]: powerpc/cell spidernet DMA coalescing
Linas Vepstas wrote: > On Tue, Oct 10, 2006 at 06:46:08PM -0700, Geoff Levand wrote: >> > Linas Vepstas wrote: >> >> The current driver code performs 512 DMA mappns of a bunch of >> >> 32-byte structures. This is silly, as they are all in contiguous >> >> memory. Ths patch changes the code to DMA map the entie area >> >> with just one call. >> >> Linas, >> >> Is the motivation for this change to improve performance by reducing the >> overhead >> of the mapping calls? > > Yes. > >> If so, there may be some benefit for some systems. Could >> you please elaborate? > > I started writingthe patch thinking it will have some huge effect on > performance, based on a false assumption on how i/o was done on this > machine > > *If* this were another pSeries system, then each call to > pci_map_single() chews up an actual hardware "translation > control entry" (TCE) that maps pci bus addresses into > system RAM addresses. These are somewhat limited resources, > and so one shouldn't squander them. Furthermore, I thouhght > TCE's have TLB's associated with them (similar to how virtual > memory page tables are backed by hardware page TLB's), of which > there are even less of. I was thinking that TLB thrashing would > have a big hit on performance. > > Turns out that there was no difference to performance at all, > and a quick look at "cell_map_single()" in arch/powerpc/platforms/cell > made it clear why: there's no fancy i/o address mapping. OK, thanks for the explanation. Actually, the current cell DMA mapping implementation uses a simple 'linear' mapping, in that, all of RAM is mapped into the bus DMA address space at once, and in fact, it is all just done at system startup. There is ongoing work to implement 'dynamic' mapping, where DMA pages are mapped into the bus DMA address space on demand. I think a key point to understand the benefit to this is that the cell processor's I/O controller maps pages per device, so you can map one DMA page to one device. I currently have this working for my platform, but have not released that work. There is some overhead to managing the mapped buffers and to request pages be mapped by the hypervisor, etc., so I was thinking that is this work of yours to consolidate the memory buffers prior to requesting the mapping could be of benefit if it was in an often executed code path. -Geoff - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 21/21]: powerpc/cell spidernet DMA coalescing
On Tue, Oct 10, 2006 at 06:46:08PM -0700, Geoff Levand wrote: > > Linas Vepstas wrote: > >> The current driver code performs 512 DMA mappns of a bunch of > >> 32-byte structures. This is silly, as they are all in contiguous > >> memory. Ths patch changes the code to DMA map the entie area > >> with just one call. > > Linas, > > Is the motivation for this change to improve performance by reducing the > overhead > of the mapping calls? Yes. > If so, there may be some benefit for some systems. Could > you please elaborate? I started writingthe patch thinking it will have some huge effect on performance, based on a false assumption on how i/o was done on this machine *If* this were another pSeries system, then each call to pci_map_single() chews up an actual hardware "translation control entry" (TCE) that maps pci bus addresses into system RAM addresses. These are somewhat limited resources, and so one shouldn't squander them. Furthermore, I thouhght TCE's have TLB's associated with them (similar to how virtual memory page tables are backed by hardware page TLB's), of which there are even less of. I was thinking that TLB thrashing would have a big hit on performance. Turns out that there was no difference to performance at all, and a quick look at "cell_map_single()" in arch/powerpc/platforms/cell made it clear why: there's no fancy i/o address mapping. Thus, the patch has only mrginal benefit; I submit it only in the name of "its the right thing to do anyway". --linas - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 21/21]: powerpc/cell spidernet DMA coalescing
On Wednesday 11 October 2006 03:46, Geoff Levand wrote: > > > > The others look good, but this one complicates the code and doesn't have > > any benefit. 20 > > for 21 isn't bad. > > Is the motivation for this change to improve performance by reducing the > overhead > of the mapping calls? If so, there may be some benefit for some systems. > Could > you please elaborate? > >From what I understand, this patch drastically reduces the number of I/O PTEs that are needed in the iommu. With the current static IOMMU mapping, it should only make a difference during initialization, but any platform that uses a dynamic mapping of iommu entries will benefit a lot from it, because: - the card can do better prefetching of consecutive memory - there are more I/O ptes available for other drivers. Arnd <>< - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 21/21]: powerpc/cell spidernet DMA coalescing
On Tue, 2006-10-10 at 18:20 -0500, jschopp wrote: > Linas Vepstas wrote: > > The current driver code performs 512 DMA mappns of a bunch of > > 32-byte structures. This is silly, as they are all in contiguous > > memory. Ths patch changes the code to DMA map the entie area > > with just one call. > > > > Signed-off-by: Linas Vepstas <[EMAIL PROTECTED]> > > Cc: James K Lewis <[EMAIL PROTECTED]> > > Cc: Arnd Bergmann <[EMAIL PROTECTED]> > > The others look good, but this one complicates the code and doesn't have any > benefit. 20 > for 21 isn't bad. Hi Joel ! I'm not sure what you mean here (especially your 20 to 21 comment). The patch looks perfectly fine to me, and in fact removes more lines of code than it adds :) Cheers, Ben. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 21/21]: powerpc/cell spidernet DMA coalescing
jschopp wrote: > Linas Vepstas wrote: >> The current driver code performs 512 DMA mappns of a bunch of >> 32-byte structures. This is silly, as they are all in contiguous >> memory. Ths patch changes the code to DMA map the entie area >> with just one call. >> >> Signed-off-by: Linas Vepstas <[EMAIL PROTECTED]> >> Cc: James K Lewis <[EMAIL PROTECTED]> >> Cc: Arnd Bergmann <[EMAIL PROTECTED]> > > The others look good, but this one complicates the code and doesn't have any > benefit. 20 > for 21 isn't bad. Linas, Is the motivation for this change to improve performance by reducing the overhead of the mapping calls? If so, there may be some benefit for some systems. Could you please elaborate? -Geoff - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 21/21]: powerpc/cell spidernet DMA coalescing
Linas Vepstas wrote: The current driver code performs 512 DMA mappns of a bunch of 32-byte structures. This is silly, as they are all in contiguous memory. Ths patch changes the code to DMA map the entie area with just one call. Signed-off-by: Linas Vepstas <[EMAIL PROTECTED]> Cc: James K Lewis <[EMAIL PROTECTED]> Cc: Arnd Bergmann <[EMAIL PROTECTED]> The others look good, but this one complicates the code and doesn't have any benefit. 20 for 21 isn't bad. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 21/21]: powerpc/cell spidernet DMA coalescing
The current driver code performs 512 DMA mappns of a bunch of 32-byte structures. This is silly, as they are all in contiguous memory. Ths patch changes the code to DMA map the entie area with just one call. Signed-off-by: Linas Vepstas <[EMAIL PROTECTED]> Cc: James K Lewis <[EMAIL PROTECTED]> Cc: Arnd Bergmann <[EMAIL PROTECTED]> drivers/net/spider_net.c | 107 +++ drivers/net/spider_net.h | 16 ++- 2 files changed, 59 insertions(+), 64 deletions(-) Index: linux-2.6.18-mm2/drivers/net/spider_net.c === --- linux-2.6.18-mm2.orig/drivers/net/spider_net.c 2006-10-10 13:40:56.0 -0500 +++ linux-2.6.18-mm2/drivers/net/spider_net.c 2006-10-10 13:42:12.0 -0500 @@ -267,25 +267,6 @@ spider_net_get_descr_status(struct spide } /** - * spider_net_free_chain - free descriptor chain - * @card: card structure - * @chain: address of chain - * - */ -static void -spider_net_free_chain(struct spider_net_card *card, - struct spider_net_descr_chain *chain) -{ - struct spider_net_descr *descr; - - for (descr = chain->tail; !descr->bus_addr; descr = descr->next) { - pci_unmap_single(card->pdev, descr->bus_addr, -SPIDER_NET_DESCR_SIZE, PCI_DMA_BIDIRECTIONAL); - descr->bus_addr = 0; - } -} - -/** * spider_net_init_chain - links descriptor chain * @card: card structure * @chain: address of chain @@ -297,15 +278,15 @@ spider_net_free_chain(struct spider_net_ * * returns 0 on success, <0 on failure */ -static int +static void spider_net_init_chain(struct spider_net_card *card, struct spider_net_descr_chain *chain, struct spider_net_descr *start_descr, + dma_addr_t buf, int no) { int i; struct spider_net_descr *descr; - dma_addr_t buf; descr = start_descr; memset(descr, 0, sizeof(*descr) * no); @@ -314,17 +295,12 @@ spider_net_init_chain(struct spider_net_ for (i=0; idmac_cmd_status = SPIDER_NET_DESCR_NOT_IN_USE; - buf = pci_map_single(card->pdev, descr, -SPIDER_NET_DESCR_SIZE, -PCI_DMA_BIDIRECTIONAL); - - if (pci_dma_mapping_error(buf)) - goto iommu_error; - descr->bus_addr = buf; + descr->next_descr_addr = 0; descr->next = descr + 1; descr->prev = descr - 1; + buf += sizeof(struct spider_net_descr); } /* do actual circular list */ (descr-1)->next = start_descr; @@ -333,17 +309,6 @@ spider_net_init_chain(struct spider_net_ spin_lock_init(&chain->lock); chain->head = start_descr; chain->tail = start_descr; - - return 0; - -iommu_error: - descr = start_descr; - for (i=0; i < no; i++, descr++) - if (descr->bus_addr) - pci_unmap_single(card->pdev, descr->bus_addr, -SPIDER_NET_DESCR_SIZE, -PCI_DMA_BIDIRECTIONAL); - return -ENOMEM; } /** @@ -1658,24 +1623,32 @@ spider_net_open(struct net_device *netde { struct spider_net_card *card = netdev_priv(netdev); struct spider_net_descr *descr; - int i, result; + int result = -ENOMEM; - result = -ENOMEM; - if (spider_net_init_chain(card, &card->tx_chain, card->descr, - card->num_tx_desc)) - goto alloc_tx_failed; + card->descr_dma_addr = pci_map_single(card->pdev, card->descr, + (card->num_tx_desc+card->num_rx_desc)*sizeof(struct spider_net_descr), +PCI_DMA_BIDIRECTIONAL); + if (pci_dma_mapping_error(card->descr_dma_addr)) + return -ENOMEM; + + spider_net_init_chain(card, &card->tx_chain, card->descr, + card->descr_dma_addr, + card->num_tx_desc); card->low_watermark = NULL; /* rx_chain is after tx_chain, so offset is descr + tx_count */ - if (spider_net_init_chain(card, &card->rx_chain, + spider_net_init_chain(card, &card->rx_chain, card->descr + card->num_tx_desc, - card->num_rx_desc)) - goto alloc_rx_failed; + card->descr_dma_addr + + card->num_tx_desc * sizeof(struct spider_net_descr), + card->num_rx_desc); descr = card->rx_chain.head; - for (i=0; i < card->num_rx_desc; i++, descr++) + do { descr->next_descr_addr = descr->next->bus_a