Re: attach additional value to skb
On Mon, Oct 22, 2007 at 10:03:47PM +0200, Yakov Lerner wrote: > I want to add my custom destructor to the skb, the function pointer > to be called at skb destruction time. Will the following work: if I > push address of my_destructor to the > sk-buff->destructor field, and then when my_destructor > is called, I call the (saved) old value of sk-buff->destructor ? > Will this work ? It might, provided nothing else runs after you that will unilaterally change the ->destructor field. I.e., it depends at which point in the skb's life-time you set your hook. Cheers, Muli -- SYSTOR 2007 --- 1st Annual Haifa Systems and Storage Conference 2007 http://www.haifa.il.ibm.com/Workshops/systor2007/ Virtualization workshop: Oct 29th, 2007 | Storage workshop: Oct 30th, 2007 - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC 2/2] shrink size of scatterlist on common i386/x86-64
On Mon, Jul 09, 2007 at 12:06:40AM -0700, David Miller wrote: > > That works, but isn't optimal when you have an isolation-capable > > IOMMU and you want the full isolation properties of the IOMMU. If > > you only flush the IOTLB when the allocator wraps around, a stale > > entry in the IOTLB can allow a DMA to go through for an IO entry > > that has already been unmapped. One way to mitigate that and still > > retain full isolation is to make sure no one else gets to use the > > frames that are the targets of the DMA until the translation has > > been flushed out of the IOTLB, but that requires pretty deep > > surgery. > > Virtualization sucks doesn't it? :-) no comment :-) FWIW isolation capable IOMMUs are also useful to catch DMA errors when drivers program devices to DMA where they shouldn't. Sure, drivers can trash the kernel in other ways, but a printk beats random memory corruption any day of the week. > It's one of the worst aspects of all of this virtualization > business. In my view it makes no sense to split up the physical > hardware. Just give control nodes complete access to everything and > instead of playing games partializing real hardware, just give > virtual instances to everybody and be done with all of this > complexity. Virtual instances have a non-negligible performance cost and development cost - you need a virtual driver for any device or class of devices out there. Not to say that they aren't useful, just that direct access (aka "passthrough") has its uses too. > Anyways, hypervisors et al. have already decided to do this > braindamage so you will need to find some way to make it go fast now > won't you? :) Working on it :-) Cheers, Muli - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC 2/2] shrink size of scatterlist on common i386/x86-64
On Fri, Jul 06, 2007 at 12:20:19PM -0700, David Miller wrote: > From: "Williams, Mitch A" <[EMAIL PROTECTED]> > Date: Fri, 6 Jul 2007 10:14:56 -0700 > > > In my opinion, IOMMU table locking is the major issue with this > > type of architecture. Since both Intel and AMD are touting IOMMUs > > for virtual- ization support, this is an issue that's going to > > need a lot of scrutiny. > > For the allocation of IOMMU entries themselves you can play tricks > using atomic operations on 64-bit words of the allocator bitmap to > avoid locking that. Hmm, any pointers? > You can use per-cpu salts to determine where to start the search and > avoid hitting the same cachelines as other cpus working on the same > table. > > But you'll need to lock in order to flush the IOMMU tlb I'm afraid. > The way to mitigate that is to only flush the IOMMU tlb once per > allocator generation. That works, but isn't optimal when you have an isolation-capable IOMMU and you want the full isolation properties of the IOMMU. If you only flush the IOTLB when the allocator wraps around, a stale entry in the IOTLB can allow a DMA to go through for an IO entry that has already been unmapped. One way to mitigate that and still retain full isolation is to make sure no one else gets to use the frames that are the targets of the DMA until the translation has been flushed out of the IOTLB, but that requires pretty deep surgery. Cheers, Muli - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC 2/2] shrink size of scatterlist on common i386/x86-64
On Fri, Jul 06, 2007 at 10:14:56AM -0700, Williams, Mitch A wrote: > David Miller wrote: > >> Okay, but then using SG lists makes skbuff's much bigger. > >> > >>fraglistscatterlistper skbuff > >> 32 bit 8 20 +12 * 18 = +216! > >> 64 bit 16 32 +16 * 18 = +288 > >> > >> So never mind... > > > >I know, this is why nobody ever really tries to tackle this. > > > >> I'll do a fraglist to scatter list set of routines, but not sure > >> if it's worth it. > > > >It's better to add dma_map_skb() et al. interfaces to be honest. > > > >Also even with the scatterlist idea, we'd still need to do two > >map calls, one for skb->data and one for the page vector. > > FWIW, I tried this about a year ago to try to improve e1000 > performance on pSeries. I was hoping to simplify the driver > transmit code and make IOMMU mapping easier. This was on 2.6.16 or > thereabouts. > > Net result: zilch. No performance increase, no noticeable CPU > utilization > benefits. Nothing. So I dropped it. Do you have pointers to the patches perchance? > Slightly off topic: > The real problem that I saw on pSeries is lock contention for the IOMMU. > It's architected with a single table per slot, which is great in that > two boards in separate slots won't have lock contention. However, this > all goes out the window when you drop a quad-port gigabit adapter in > there. > The time spent waiting for the IOMMU table lock goes up exponentially > as you activate each additional port. > > In my opinion, IOMMU table locking is the major issue with this type > of architecture. Since both Intel and AMD are touting IOMMUs for > virtual- ization support, this is an issue that's going to need a > lot of scrutiny. Agreed. Cheers, Muli - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: compile breakage due to [SK_BUFF]: Convert skb->end to sk_buff_data_t
On Sun, May 13, 2007 at 08:27:35PM -0300, Arnaldo Carvalho de Melo wrote: > Well, the fix is easy, can you provide a patch? Sure, but I was hoping you had a follow-on patch to get rid of the ugly ifdefs in .c files that would make it irrelevant? if not I'll whip out the obvious patch. Cheers, Muli - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
compile breakage due to [SK_BUFF]: Convert skb->end to sk_buff_data_t
On Fri, Apr 27, 2007 at 05:03:44PM +, Linux Kernel Mailing List wrote: > Gitweb: > http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=4305b541357ddbd205aa145dc378926b7cb12283 > Commit: 4305b541357ddbd205aa145dc378926b7cb12283 > Parent: 27a884dc3cb63b93c2b3b643f5b31eed5f8a4d26 > Author: Arnaldo Carvalho de Melo <[EMAIL PROTECTED]> > AuthorDate: Thu Apr 19 20:43:29 2007 -0700 > Committer: David S. Miller <[EMAIL PROTECTED]> > CommitDate: Wed Apr 25 22:26:29 2007 -0700 > > [SK_BUFF]: Convert skb->end to sk_buff_data_t > > Now to convert the last one, skb->data, that will allow many > simplifications > and removal of some of the offset helpers. > > Signed-off-by: Arnaldo Carvalho de Melo <[EMAIL PROTECTED]> > Signed-off-by: David S. Miller <[EMAIL PROTECTED]> This patch and the previous one breaks the compilation on one of my machines. Specifically, this bit: > @@ -632,12 +644,13 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, > int ntail, > /* Copy only real data... and, alas, header. This should be >* optimized for the cases when header is void. */ > memcpy(data + nhead, skb->head, > - skb->tail > -#ifndef NET_SKBUFF_DATA_USES_OFFSET > - - skb->head > +#ifdef NET_SKBUFF_DATA_USES_OFFSET > + skb->tail); > +#else > + skb->tail - skb->head); > #endif > - ); > - memcpy(data + size, skb->end, sizeof(struct skb_shared_info)); > + memcpy(data + size, skb_end_pointer(skb), > +sizeof(struct skb_shared_info)); > > for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) > get_page(skb_shinfo(skb)->frags[i].page); Causes this compile error: /home/muli/kernel/trident/trident.git/net/core/skbuff.c:648:1: directives may not be used inside a macro argument /home/muli/kernel/trident/trident.git/net/core/skbuff.c:647:39: unterminated argument list invoking macro "memcpy" /home/muli/kernel/trident/trident.git/net/core/skbuff.c: In function `pskb_expand_head': /home/muli/kernel/trident/trident.git/net/core/skbuff.c:651: `memcpy' undeclared (first use in this function) /home/muli/kernel/trident/trident.git/net/core/skbuff.c:651: (Each undeclared identifier is reported only once /home/muli/kernel/trident/trident.git/net/core/skbuff.c:651: for each function it appears in.) /home/muli/kernel/trident/trident.git/net/core/skbuff.c:651: syntax error before "skb" [EMAIL PROTECTED]:~/kernel/trident/trident.git$ gcc -v Reading specs from /usr/lib/gcc-lib/i386-redhat-linux/3.2.3/specs Configured with: ../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --enable-shared --enable-threads=posix --disable-checking --with-system-zlib --enable-__cxa_atexit --host=i386-redhat-linux Thread model: posix gcc version 3.2.3 20030502 (Red Hat Linux 3.2.3-53) It happens because memcpy ends up being a macro, and this gcc dislikes having a preprocessor directive inside a macro argument. I don't know if if it's valid in general, but Documentation/Changes does say we still support gcc 3.2... Cheers, Muli - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [kvm-devel] QEMU PIC indirection patch for in-kernel APIC work
On Sun, Apr 08, 2007 at 08:36:14AM +0300, Avi Kivity wrote: > That is not the common case. Nor is it true when there is a > mismatch between the card's capabilties and guest expectations and > constraints. For example, guest memory is not physically contiguous > so a NIC that won't do scatter/gather will require bouncing (or an > iommu, but that's not here yet). Actually, Allen Key from Intel just posted the first VT-d patches to xen-devel a couple of days ago. I wonder if anyone is working on kvm support (which would require Linux support). Cheers, Muli - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [NET] fib_rules: Flush route cache after rule modifications
On Tue, Mar 27, 2007 at 03:21:59PM +0200, Thomas Graf wrote: > The results of FIB rules lookups are cached in the routing cache > except for IPv6 as no such cache exists. So far, it was the > responsibility of the user to flush the cache after modifying any > rules. This lead to many false bug reports due to misunderstanding > of this concept. > > This patch automatically flushes the route cache after inserting > or deleting a rule. > > Signed-off-by: Thomas Graf <[EMAIL PROTECTED]> > > Index: net-2.6.22/include/net/fib_rules.h > === > --- net-2.6.22.orig/include/net/fib_rules.h 2007-03-27 13:54:52.0 > +0200 > +++ net-2.6.22/include/net/fib_rules.h2007-03-27 14:16:24.0 > +0200 > @@ -59,6 +59,10 @@ struct fib_rules_ops > u32 (*default_pref)(void); > size_t (*nlmsg_payload)(struct fib_rule *); > > + /* Called after modifications to the rules set, must flush > + * the route cache if one exists. */ > + void(*flush_cache)(void); > + > int nlgroup; > struct nla_policy *policy; > struct list_head*rules_list; > Index: net-2.6.22/net/core/fib_rules.c > === > --- net-2.6.22.orig/net/core/fib_rules.c 2007-03-27 13:53:29.0 > +0200 > +++ net-2.6.22/net/core/fib_rules.c 2007-03-27 13:59:20.0 +0200 > @@ -44,6 +44,12 @@ static void rules_ops_put(struct fib_rul > module_put(ops->owner); > } > > +static void flush_route_cache(struct fib_rules_ops *ops) > +{ > + if (ops->flush_cache) > + ops->flush_cache(); > +} > + > int fib_rules_register(struct fib_rules_ops *ops) > { > int err = -EEXIST; > @@ -315,6 +321,7 @@ static int fib_nl_newrule(struct sk_buff > > notify_rule_change(RTM_NEWRULE, rule, ops, nlh, NETLINK_CB(skb).pid); > rules_ops_put(ops); > + flush_route_cache(ops); > return 0; > > errout_free: > @@ -405,6 +412,7 @@ static int fib_nl_delrule(struct sk_buff > NETLINK_CB(skb).pid); > fib_rule_put(rule); > rules_ops_put(ops); > + flush_route_cache(ops); > return 0; > } That looks like a bug - shouldn't we flush the cache first, then do the rules_ops_put()? Cheers, Muli - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: network devices don't handle pci_dma_mapping_error()'s
On Thu, Dec 07, 2006 at 11:48:14AM +0530, Amit S. Kale wrote: > On the x86_64 boxes that don't feature iommu functionality (because the > motherboard disables it or because Linux can't handle it) Linux bounce buffer > framework automatically comes into picture. Could we have the same framework > take over when IOMMU space is over? I don't think this is possible with > present code, though. We probably can have fallback_dma_ops in addition to > dma_ops. In the general case, no - some platforms (including x86-64 on IBM's high end servers!) have an isolation capable IOMMU, which means all DMA mappings need to go through it, so a general mechanism to cope with DMA mappings running out is still needed. Cheers, Muli - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: network devices don't handle pci_dma_mapping_error()'s
On Wed, Dec 06, 2006 at 10:16:44AM -0800, Stephen Hemminger wrote: > I think it is really only an issue for drivers that turn on HIGH_DMA > and have limited mask values. The majority of drivers either only > handle 32 bit (!HIGH_DMA) or do full 64 bit mapping. I don't know > the details of how we manage IOMMU, but doesn't mapping always work > for those drivers. It's up to an IOMMU (DMA-API) implementation to define what constitutes a mapping error, e.g., Calgary and GART on x86-64 will return bad_dma_address from the mapping functions when they run out of entries in the IO space, which can happen regardless of the mask. Cheers, Muli - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: network devices don't handle pci_dma_mapping_error()'s
On Mon, Dec 04, 2006 at 10:39:49AM -0800, Stephen Hemminger wrote: > I notice that no current network driver handles dma mapping errors. > Might that be part of the problem. On i386, this never happens, and > it would be rare on most others. IOMMUs are already available on x86-64 and are going to get widespread with the the introduction of IOMMUs from Intel and AMD. Might as well fix it now... How about CONFIG_DEBUG_DMA_API that does book-keeping and yells if a driver is mis-using the DMA API? Cheers, Muli - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH,RESEND] smc91x: disable DMA mode on the logicpd pxa270
On Sat, Aug 12, 2006 at 11:30:35AM +0200, Lennert Buytenhek wrote: > Nope. Unfortunately, the line that made that clear didn't fall inside > the patch context: Ah, I see. Thanks. Cheers, Muli - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH,RESEND] smc91x: disable DMA mode on the logicpd pxa270
On Sat, Aug 12, 2006 at 10:55:07AM +0200, Lennert Buytenhek wrote: > Enabling PXA DMA for the smc91x on the logicpd pxa270 produces > unacceptable interference with the TFT panel, so disable it. Doesn't this disable it for all users, though? are there any other users? should this be changed from a #define to a run-time toggle? Cheers, Muli - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] ehea: queue managment
On Wed, Jun 07, 2006 at 10:29:50AM -0700, Stephen Hemminger wrote: > Be careful about global namespace issues. Stick to one prefix like ehea_ > for all non static function names. Consider putting all in one file, or > use #include to cause it to be one compilation unit. I thought including .c files was discouraged (if not, it should be...) Cheers, Muli - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html