Re: [openib-general] Ordering between PCI config space writes and MMIO reads?
From: John Partridge [EMAIL PROTECTED] Date: Wed, 01 Nov 2006 10:27:19 -0600 Sorry, but I find this change a bit puzzling. The problem is particular to the PPB on the HCA and not Altix. I can't see anywhere that a PCI Config Write is required to block until completion, it is the driver and the HCA ,not the Altix hardware that requires the Config Write to have completed before we leave mthca_reset() Changing pci_write_config_xxx() will change the behavior for ALL drivers and the possibility of breaking something else. The fix was very low risk in mthca_reset(), changing the PCI code to fix this is much more onerous. The issue is that something as simple as: val = pci_read_config(REG); val |= bit; pci_write_config(REG, val); newval = pci_read_config(REG); BUG_ON(!(newval bit)); is not guarenteed by PCI (aparently). I see no valid reason why every PCI device driver should be troubled with this lunacy and the ordering should thus be ensured by the PCI layer. It just so happens to take care of the original driver issue too :-) ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH 4 of 9] NetEffect 10Gb RNIC Driver: kernel driver header files
From: Glenn Grundstrom [EMAIL PROTECTED] Date: Thu, 26 Oct 2006 19:06:23 -0500 +#include nes_tcpip/include/nes_sockets.h I want to know what in the world this nes_tcpip thing is? ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] Ordering between PCI config space writes and MMIO reads?
From: Matthew Wilcox [EMAIL PROTECTED] Date: Tue, 24 Oct 2006 16:36:32 -0600 This is only really a problem for setup (when we program the BARs), so it seems silly to enforce an ordering at any other time. Reluctantly, I must disagree with Jeff -- drivers need to fix this. One thing is that we definitely don't want to fix this by, for example, reading back the PCI_COMMAND register or something like that. That causes two problems: 1) Some PCI config writes shut the device down and make it no respond to some kinds of PCI config transactions. One example is putting the device into D3 or similar power state, another is performing a device reset. 2) Several drivers use PCI config space accesses to touch the main registers in order to workaround bugs in the PCI-X implementation of their chip or similar (tg3 has a few cases like this), doing a PCI config space readback will kill performance quite a bit for an already slow situation. In fact, I do recall that one of the x86 PCI config space access implementations did a readback like this, and we had to remove it because it caused problems when doing a reset on tg3 chips when using PCI config space register write to do the reset. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] Dropping NETIF_F_SG since no checksum feature.
From: Michael S. Tsirkin [EMAIL PROTECTED] Date: Thu, 12 Oct 2006 21:12:06 +0200 Quoting r. David Miller [EMAIL PROTECTED]: Subject: Re: Dropping NETIF_F_SG since no checksum feature. Numbers? I created two subnets on top of the same pair infiniband HCAs: I was asking for SG vs. non-SG numbers so I could see proof that it really does help like you say it will. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] Dropping NETIF_F_SG since no checksum feature.
From: Michael S. Tsirkin [EMAIL PROTECTED] Date: Wed, 11 Oct 2006 11:05:04 +0200 So, it seems that if I set NETIF_F_SG but clear NETIF_F_ALL_CSUM, data will be copied over rather than sent directly. So why does dev.c have to force set NETIF_F_SG to off then? Because it's more efficient to copy into a linear destination buffer of an SKB than page sub-chunks when doing checksum+copy. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] Dropping NETIF_F_SG since no checksum feature.
From: Michael S. Tsirkin [EMAIL PROTECTED] Date: Wed, 11 Oct 2006 17:01:03 +0200 Quoting Steven Whitehouse [EMAIL PROTECTED]: ssize_t tcp_sendpage(struct socket *sock, struct page *page, int offset, size_t size, int flags) { ssize_t res; struct sock *sk = sock-sk; if (!(sk-sk_route_caps NETIF_F_SG) || !(sk-sk_route_caps NETIF_F_ALL_CSUM)) return sock_no_sendpage(sock, page, offset, size, flags); So, it seems that if I set NETIF_F_SG but clear NETIF_F_ALL_CSUM, data will be copied over rather than sent directly. So why does dev.c have to force set NETIF_F_SG to off then? I agree with that analysis, So, would you Ack something like the following then? I certainly don't. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] Dropping NETIF_F_SG since no checksum feature.
From: Michael S. Tsirkin [EMAIL PROTECTED] Date: Wed, 11 Oct 2006 23:23:39 +0200 With my patch, there is a huge performance gain by increasing MTU to 64K. And it seems the only way to do this is by S/G. Numbers? ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] Dropping NETIF_F_SG since no checksum feature.
From: Michael S. Tsirkin [EMAIL PROTECTED] Date: Wed, 11 Oct 2006 02:13:38 +0200 Maybe I can patch linux to allow SG without checksum? Dave, maybe you could drop a hint or two on whether this is worthwhile and what are the issues that need addressing to make this work? I imagine it's not just the matter of changing net/core/dev.c :). You can't, it's a quality of implementation issue. We sendfile() pages directly out of the filesystem page cache without any blocking of modifications to the page contents, and the only way that works is if the card computes the checksum for us. If we sendfile() a page directly, we must compute a correct checksum no matter what the contents. We can't do this on the cpu before the data hits the device because another thread of execution can go in and modify the page contents which would invalidate the checksum and thus invalidating the packet. We cannot allow this. Blocking modifications is too expensive, so that's not an option either. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] Dropping NETIF_F_SG since no checksum feature.
From: Roland Dreier [EMAIL PROTECTED] Date: Tue, 10 Oct 2006 20:33:46 -0700 Michael My guess was, an extra pass over data is likely to be Michael expensive - dirtying the cache if nothing else. But I do Michael plan to measure that, and see. I don't get it -- where's the extra pass? If you can't compute the checksum on the NIC then you have to compute sometime it on the CPU before passing the data to the NIC. Also, if you don't do checksumming on the card we MUST copy the data (be it from a user buffer, or from a filesystem page cache page) into a private buffer since if the data changes the checksum would become invalid, as I mentioned in another email earlier. Therefore, since we have to copy anyways, it always is better to checksum in parallel with the copy. So the whole idea of SG without hw-checksum support is without much merit at all. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] Dropping NETIF_F_SG since no checksum feature.
From: Roland Dreier [EMAIL PROTECTED] Date: Tue, 10 Oct 2006 20:42:20 -0700 On the other hand I'm not sure how useful such a netdevice would be -- will non-sendfile() paths generate big packets even if the MTU is 64KB? non-sendfile() paths will generate big packets just fine, as long as the application is providing that much data. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] Dropping NETIF_F_SG since no checksum feature.
From: Roland Dreier [EMAIL PROTECTED] Date: Tue, 10 Oct 2006 20:49:09 -0700 David non-sendfile() paths will generate big packets just fine, David as long as the application is providing that much data. OK, cool. Will the big packets be non-linear skbs? If you had SG enabled (and thus checksumming offload too) then yes you'll get a non-linear SKB. Otherwise, without SG, you'll get a fully linear SKB. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH Round 5 0/3] Network Event Notifier Mechanism
From: Steve Wise [EMAIL PROTECTED] Date: Fri, 28 Jul 2006 13:28:49 -0500 Dave/Roland, is this patchset about ready to go? All 3 patches applied, thanks Steve. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH Round 4 2/3] Core network changes to support network event notification.
From: Steve Wise [EMAIL PROTECTED] Date: Wed, 26 Jul 2006 11:15:43 -0500 Dave, what do you think about removing the user-space stuff for the first round of integration? IE: Just add netevents and kernel hooks to generate them. Sure. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] Suggestions for how to remove bus_to_virt()
From: Ralph Campbell [EMAIL PROTECTED] Date: Fri, 14 Jul 2006 15:27:07 -0700 Note that the current design only supports one IOMMU type in a system. This could support multiple IOMMU types at the same time. This is not true, the framework allows multiply such types and in fact Sparc64 takes advantage of this. We have about 4 or 5 different PCI controllers, and the IOMMUs are slightly different in each. Even with the standard PCI DMA mapping calls, we can gather the platform private information necessary to program the IOMMU appropriately for a given chipset. The dma_mapping_ops idea will never get accepted by folks like Linus, for reasons I've outlined in previous emails in this thread. So, it's best to look elsewhere for solutions to your problem, such as the ideas used by the USB and IEE1394 device layers. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] Suggestions for how to remove bus_to_virt()
From: Ralph Campbell [EMAIL PROTECTED] Date: Wed, 12 Jul 2006 16:29:27 -0700 Currently, the ib_ipath driver requires that the mapping be one-to-one since there is no practical way to reverse IOMMU mappings. You can maintain a hash table that maps DMA addresses back to kernel mappings. Depending upon your situation, you can optimize this to use very small keys if you have some kind of other identification method for your buffers. That would be for dynamic mappings. You were using consistent DMA memory, which I gather you're not, you could use the PCI DMA pool mechanism. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] Suggestions for how to remove bus_to_virt()
From: Roland Dreier [EMAIL PROTECTED] Date: Wed, 12 Jul 2006 17:11:26 -0700 A cleaner solution would be to make the dma_ API really use the device it's passed anyway, and allow drivers to override the standard PCI stuff nicely. But that would be major surgery, I guess. Clean but expensive, you should not force the rest of the kernel to eat the cost of something you want to do when it's totally unnecessary for most other users. For example, x86 never needs to do anything other than a direct virt_to_phys translation to produce a DMA address, no matter what bus the device is on. It's a single simple integer adjustment that can be done inline in about 2 or 3 instructions at most. Once you start allowing overrides then even x86 starts to eat the stupid costs of dereferencing some kind of device ops method. That doesn't make any sense, and that's why the DMA API works the way it does now. It's a platform or bus operation, not a device one. If you need device level DMA mapping semantics, create them for your device type. This is what USB does, btw. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH 39 of 39] IB/ipath - use streaming copy in RDMA interrupt handler to reduce packet loss
From: Bryan O'Sullivan [EMAIL PROTECTED] Date: Thu, 29 Jun 2006 14:41:30 -0700 +/* + * Copy data. Try not to pollute the dcache with the source data, + * because we won't be reading it again. + */ +#if defined(CONFIG_X86_64) +void *ipath_memcpy_nc(void *dest, const void *src, size_t n); +#else +#define ipath_memcpy_nc(dest, src, n) memcpy(dest, src, n) +#endif A facility like this doesn't belong in some arbitrary driver layer. It belongs as a generic facility the whole kernel could make use of. Please stop polluting the infiniband drivers with Opteron crap. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH 38 of 39] IB/ipath - More changes to support InfiniPath on PowerPC 970 systems
From: Bryan O'Sullivan [EMAIL PROTECTED] Date: Thu, 29 Jun 2006 14:41:29 -0700 ipath_core-$(CONFIG_X86_64) += ipath_wc_x86_64.o +ipath_core-$(CONFIG_PPC64) += ipath_wc_ppc64.o Again, don't put these kinds of cpu specific functions into the infiniband driver. They are potentially globally useful, not something only Infiniband might want to do. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH 38 of 39] IB/ipath - More changes to support InfiniPath on PowerPC 970 systems
From: Bryan O'Sullivan [EMAIL PROTECTED] Date: Thu, 29 Jun 2006 15:01:39 -0700 The support for write combining in the kernel is not in a state where that makes any sense at the moment. Please fix the generic code if it doesn't provide the facility you need at the moment. Don't shoe horn it into your driver just to make up for that. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH 39 of 39] IB/ipath - use streaming copy in RDMA interrupt handler to reduce packet loss
From: Bryan O'Sullivan [EMAIL PROTECTED] Date: Thu, 29 Jun 2006 14:59:37 -0700 It could, indeed. In fact, we had that discussion here before I sent this patch in. It presumably wants to live in lib/, and acquire a more generic name. What name will capture the uncached-read-but-cached-write semantics in a useful fashion? memcpy_nc? I'm not good with names :-) Note that there also might be cases where using such a memcpy variant might be the wrong thing to do. For example, for a very tightly coupled CMT cpu implementation which has the memory controller, L2 cache, PCI controller, etc. all on the same die and the PCI controller makes use of the L2 cache just like the cpu threads do, using this kind of memcpy would always be the wrong thing to do. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH 39 of 39] IB/ipath - use streaming copy in RDMA interrupt handler to reduce packet loss
From: Bryan O'Sullivan [EMAIL PROTECTED] Date: Thu, 29 Jun 2006 16:34:23 -0700 I'm not quite following you, though I assume you're referring to Niagara or Rock :-) Are you saying a memcpy_nc would do worse than plain memcpy, or worse than some other memcpy-like routine? It would do worse than memcpy. If you bypass the L2 cache, it's pointless because the next agent (PCI controller, CPU thread, etc.) is going to need the data in the L2 cache. It's better in that kind of setup to eat the L2 cache miss overhead in memcpy since memcpy can usually prefetch and store buffer in order to absorb some of the L2 miss costs. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH 39 of 39] IB/ipath - use streaming copy in RDMA interrupt handler to reduce packet loss
From: Rick Jones [EMAIL PROTECTED] Date: Thu, 29 Jun 2006 17:28:50 -0700 I thought that most PCI controllers (that is to say the things bridging PCI to the rest of the system) could do prefetching and/or that PCI-X (if not PCI, no idea about PCI-e) cards could issue multiple transactions anyway? People doing deep CMT chips have found out that all of that prefetching and store buffering is unnecessary when everything is so tightly integrated. All of the previous UltraSPARC boxes before Niagara had a streaming cache sitting on the PCI controller. It basically prefetched for reads and collected writes from PCI devices into cacheline sized chunks. The PCI controller in the current Niagara systems has none of that stuff. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH 39 of 39] IB/ipath - use streaming copy in RDMA interrupt handler to reduce packet loss
From: Rick Jones [EMAIL PROTECTED] Date: Thu, 29 Jun 2006 17:44:05 -0700 Then is prefetching in memcpy really that important to them. Not really, the thread just blocks while waiting for memory. On stores they do a cacheline fill optimization similar to the powerpc. Relying on PCI-X devices to issue multiple requests then? Perhaps :) ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH 0/2][RFC] Network Event Notifier Mechanism
Most of the folks capable of reviewing networking changes listen in on the netdev@vger.kernel.org mailing list, not here. Thanks. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general