Re: attach additional value to skb

2007-10-22 Thread Muli Ben-Yehuda
On Mon, Oct 22, 2007 at 10:03:47PM +0200, Yakov Lerner wrote:

> I want to add my custom destructor to the skb, the function pointer
> to be called at skb destruction time. Will the following work: if I
> push address of my_destructor to the
> sk-buff->destructor field,  and then when my_destructor
> is called, I call the (saved)  old value of  sk-buff->destructor ?
> Will this work ?

It might, provided nothing else runs after you that will unilaterally
change the ->destructor field. I.e., it depends at which point in the
skb's life-time you set your hook.

Cheers,
Muli
-- 
SYSTOR 2007 --- 1st Annual Haifa Systems and Storage Conference 2007
http://www.haifa.il.ibm.com/Workshops/systor2007/

Virtualization workshop: Oct 29th, 2007 | Storage workshop: Oct 30th, 2007
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC 2/2] shrink size of scatterlist on common i386/x86-64

2007-07-09 Thread Muli Ben-Yehuda
On Mon, Jul 09, 2007 at 12:06:40AM -0700, David Miller wrote:

> > That works, but isn't optimal when you have an isolation-capable
> > IOMMU and you want the full isolation properties of the IOMMU. If
> > you only flush the IOTLB when the allocator wraps around, a stale
> > entry in the IOTLB can allow a DMA to go through for an IO entry
> > that has already been unmapped. One way to mitigate that and still
> > retain full isolation is to make sure no one else gets to use the
> > frames that are the targets of the DMA until the translation has
> > been flushed out of the IOTLB, but that requires pretty deep
> > surgery.
> 
> Virtualization sucks doesn't it? :-)

no comment :-) FWIW isolation capable IOMMUs are also useful to catch
DMA errors when drivers program devices to DMA where they
shouldn't. Sure, drivers can trash the kernel in other ways, but a
printk beats random memory corruption any day of the week.

> It's one of the worst aspects of all of this virtualization
> business.  In my view it makes no sense to split up the physical
> hardware.  Just give control nodes complete access to everything and
> instead of playing games partializing real hardware, just give
> virtual instances to everybody and be done with all of this
> complexity.

Virtual instances have a non-negligible performance cost and
development cost - you need a virtual driver for any device or class
of devices out there. Not to say that they aren't useful, just that
direct access (aka "passthrough") has its uses too.

> Anyways, hypervisors et al. have already decided to do this
> braindamage so you will need to find some way to make it go fast now
> won't you? :)

Working on it :-)

Cheers,
Muli
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC 2/2] shrink size of scatterlist on common i386/x86-64

2007-07-08 Thread Muli Ben-Yehuda
On Fri, Jul 06, 2007 at 12:20:19PM -0700, David Miller wrote:
> From: "Williams, Mitch A" <[EMAIL PROTECTED]>
> Date: Fri, 6 Jul 2007 10:14:56 -0700
> 
> > In my opinion, IOMMU table locking is the major issue with this
> > type of architecture.  Since both Intel and AMD are touting IOMMUs
> > for virtual- ization support, this is an issue that's going to
> > need a lot of scrutiny.
> 
> For the allocation of IOMMU entries themselves you can play tricks
> using atomic operations on 64-bit words of the allocator bitmap to
> avoid locking that.

Hmm, any pointers?

> You can use per-cpu salts to determine where to start the search and
> avoid hitting the same cachelines as other cpus working on the same
> table.
> 
> But you'll need to lock in order to flush the IOMMU tlb I'm afraid.
> The way to mitigate that is to only flush the IOMMU tlb once per
> allocator generation.

That works, but isn't optimal when you have an isolation-capable IOMMU
and you want the full isolation properties of the IOMMU. If you only
flush the IOTLB when the allocator wraps around, a stale entry in the
IOTLB can allow a DMA to go through for an IO entry that has already
been unmapped. One way to mitigate that and still retain full
isolation is to make sure no one else gets to use the frames that are
the targets of the DMA until the translation has been flushed out of
the IOTLB, but that requires pretty deep surgery.

Cheers,
Muli


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC 2/2] shrink size of scatterlist on common i386/x86-64

2007-07-08 Thread Muli Ben-Yehuda
On Fri, Jul 06, 2007 at 10:14:56AM -0700, Williams, Mitch A wrote:
> David Miller wrote:
> >> Okay, but then using SG lists makes skbuff's much bigger.
> >> 
> >>fraglistscatterlistper skbuff
> >> 32 bit 8   20  +12 * 18 = +216!
> >> 64 bit 16  32  +16 * 18 = +288
> >> 
> >> So never mind...
> >
> >I know, this is why nobody ever really tries to tackle this.
> >
> >> I'll do a fraglist to scatter list set of routines, but not sure
> >> if it's worth it.
> >
> >It's better to add dma_map_skb() et al. interfaces to be honest.
> >
> >Also even with the scatterlist idea, we'd still need to do two
> >map calls, one for skb->data and one for the page vector.
> 
> FWIW, I tried this about a year ago to try to improve e1000
> performance on pSeries.  I was hoping to simplify the driver
> transmit code and make IOMMU mapping easier.  This was on 2.6.16 or
> thereabouts.
> 
> Net result:  zilch.  No performance increase, no noticeable CPU
> utilization
> benefits.  Nothing.  So I dropped it.

Do you have pointers to the patches perchance?

> Slightly off topic:
> The real problem that I saw on pSeries is lock contention for the IOMMU.
> It's architected with a single table per slot, which is great in that
> two boards in separate slots won't have lock contention.  However, this
> all goes out the window when you drop a quad-port gigabit adapter in
> there.
> The time spent waiting for the IOMMU table lock goes up exponentially
> as you activate each additional port.
> 
> In my opinion, IOMMU table locking is the major issue with this type
> of architecture.  Since both Intel and AMD are touting IOMMUs for
> virtual- ization support, this is an issue that's going to need a
> lot of scrutiny.

Agreed.

Cheers,
Muli
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: compile breakage due to [SK_BUFF]: Convert skb->end to sk_buff_data_t

2007-05-13 Thread Muli Ben-Yehuda
On Sun, May 13, 2007 at 08:27:35PM -0300, Arnaldo Carvalho de Melo wrote:

> Well, the fix is easy, can you provide a patch?

Sure, but I was hoping you had a follow-on patch to get rid of the
ugly ifdefs in .c files that would make it irrelevant? if not I'll
whip out the obvious patch.

Cheers,
Muli
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


compile breakage due to [SK_BUFF]: Convert skb->end to sk_buff_data_t

2007-05-13 Thread Muli Ben-Yehuda
On Fri, Apr 27, 2007 at 05:03:44PM +, Linux Kernel Mailing List wrote:

> Gitweb: 
> http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=4305b541357ddbd205aa145dc378926b7cb12283
> Commit: 4305b541357ddbd205aa145dc378926b7cb12283
> Parent: 27a884dc3cb63b93c2b3b643f5b31eed5f8a4d26
> Author: Arnaldo Carvalho de Melo <[EMAIL PROTECTED]>
> AuthorDate: Thu Apr 19 20:43:29 2007 -0700
> Committer:  David S. Miller <[EMAIL PROTECTED]>
> CommitDate: Wed Apr 25 22:26:29 2007 -0700
> 
> [SK_BUFF]: Convert skb->end to sk_buff_data_t
> 
> Now to convert the last one, skb->data, that will allow many 
> simplifications
> and removal of some of the offset helpers.
> 
> Signed-off-by: Arnaldo Carvalho de Melo <[EMAIL PROTECTED]>
> Signed-off-by: David S. Miller <[EMAIL PROTECTED]>

This patch and the previous one breaks the compilation on one of my
machines. Specifically, this bit:

> @@ -632,12 +644,13 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, 
> int ntail,
>   /* Copy only real data... and, alas, header. This should be
>* optimized for the cases when header is void. */
>   memcpy(data + nhead, skb->head,
> - skb->tail
> -#ifndef NET_SKBUFF_DATA_USES_OFFSET
> - - skb->head
> +#ifdef NET_SKBUFF_DATA_USES_OFFSET
> + skb->tail);
> +#else
> + skb->tail - skb->head);
>  #endif
> - );
> - memcpy(data + size, skb->end, sizeof(struct skb_shared_info));
> + memcpy(data + size, skb_end_pointer(skb),
> +sizeof(struct skb_shared_info));
>  
>   for (i = 0; i < skb_shinfo(skb)->nr_frags; i++)
>   get_page(skb_shinfo(skb)->frags[i].page);

Causes this compile error:

/home/muli/kernel/trident/trident.git/net/core/skbuff.c:648:1: directives may 
not be used inside a macro argument
/home/muli/kernel/trident/trident.git/net/core/skbuff.c:647:39: unterminated 
argument list invoking macro "memcpy"
/home/muli/kernel/trident/trident.git/net/core/skbuff.c: In function 
`pskb_expand_head':
/home/muli/kernel/trident/trident.git/net/core/skbuff.c:651: `memcpy' 
undeclared (first use in this function)
/home/muli/kernel/trident/trident.git/net/core/skbuff.c:651: (Each undeclared 
identifier is reported only once
/home/muli/kernel/trident/trident.git/net/core/skbuff.c:651: for each function 
it appears in.)
/home/muli/kernel/trident/trident.git/net/core/skbuff.c:651: syntax error 
before "skb"

[EMAIL PROTECTED]:~/kernel/trident/trident.git$ gcc -v
Reading specs from /usr/lib/gcc-lib/i386-redhat-linux/3.2.3/specs
Configured with: ../configure --prefix=/usr --mandir=/usr/share/man 
--infodir=/usr/share/info --enable-shared --enable-threads=posix 
--disable-checking --with-system-zlib --enable-__cxa_atexit 
--host=i386-redhat-linux
Thread model: posix
gcc version 3.2.3 20030502 (Red Hat Linux 3.2.3-53)

It happens because memcpy ends up being a macro, and this gcc dislikes
having a preprocessor directive inside a macro argument. I don't know
if if it's valid in general, but Documentation/Changes does say we
still support gcc 3.2...

Cheers,
Muli

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [kvm-devel] QEMU PIC indirection patch for in-kernel APIC work

2007-04-08 Thread Muli Ben-Yehuda
On Sun, Apr 08, 2007 at 08:36:14AM +0300, Avi Kivity wrote:

> That is not the common case.  Nor is it true when there is a
> mismatch between the card's capabilties and guest expectations and
> constraints.  For example, guest memory is not physically contiguous
> so a NIC that won't do scatter/gather will require bouncing (or an
> iommu, but that's not here yet).

Actually, Allen Key from Intel just posted the first VT-d patches to
xen-devel a couple of days ago. I wonder if anyone is working on kvm
support (which would require Linux support).

Cheers,
Muli
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [NET] fib_rules: Flush route cache after rule modifications

2007-03-27 Thread Muli Ben-Yehuda
On Tue, Mar 27, 2007 at 03:21:59PM +0200, Thomas Graf wrote:

> The results of FIB rules lookups are cached in the routing cache
> except for IPv6 as no such cache exists. So far, it was the
> responsibility of the user to flush the cache after modifying any
> rules. This lead to many false bug reports due to misunderstanding
> of this concept.
> 
> This patch automatically flushes the route cache after inserting
> or deleting a rule.
> 
> Signed-off-by: Thomas Graf <[EMAIL PROTECTED]>
> 
> Index: net-2.6.22/include/net/fib_rules.h
> ===
> --- net-2.6.22.orig/include/net/fib_rules.h   2007-03-27 13:54:52.0 
> +0200
> +++ net-2.6.22/include/net/fib_rules.h2007-03-27 14:16:24.0 
> +0200
> @@ -59,6 +59,10 @@ struct fib_rules_ops
>   u32 (*default_pref)(void);
>   size_t  (*nlmsg_payload)(struct fib_rule *);
>  
> + /* Called after modifications to the rules set, must flush
> +  * the route cache if one exists. */
> + void(*flush_cache)(void);
> +
>   int nlgroup;
>   struct nla_policy   *policy;
>   struct list_head*rules_list;
> Index: net-2.6.22/net/core/fib_rules.c
> ===
> --- net-2.6.22.orig/net/core/fib_rules.c  2007-03-27 13:53:29.0 
> +0200
> +++ net-2.6.22/net/core/fib_rules.c   2007-03-27 13:59:20.0 +0200
> @@ -44,6 +44,12 @@ static void rules_ops_put(struct fib_rul
>   module_put(ops->owner);
>  }
>  
> +static void flush_route_cache(struct fib_rules_ops *ops)
> +{
> + if (ops->flush_cache)
> + ops->flush_cache();
> +}
> +
>  int fib_rules_register(struct fib_rules_ops *ops)
>  {
>   int err = -EEXIST;
> @@ -315,6 +321,7 @@ static int fib_nl_newrule(struct sk_buff
>  
>   notify_rule_change(RTM_NEWRULE, rule, ops, nlh, NETLINK_CB(skb).pid);
>   rules_ops_put(ops);
> + flush_route_cache(ops);
>   return 0;
>  
>  errout_free:
> @@ -405,6 +412,7 @@ static int fib_nl_delrule(struct sk_buff
>  NETLINK_CB(skb).pid);
>   fib_rule_put(rule);
>   rules_ops_put(ops);
> + flush_route_cache(ops);
>   return 0;
>   }

That looks like a bug - shouldn't we flush the cache first, then do
the rules_ops_put()?

Cheers,
Muli
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: network devices don't handle pci_dma_mapping_error()'s

2006-12-07 Thread Muli Ben-Yehuda
On Thu, Dec 07, 2006 at 11:48:14AM +0530, Amit S. Kale wrote:

> On the x86_64 boxes that don't feature iommu functionality (because the 
> motherboard disables it or because Linux can't handle it) Linux bounce buffer 
> framework automatically comes into picture. Could we have the same framework 
> take over when IOMMU space is over? I don't think this is possible with 
> present code, though. We probably can have fallback_dma_ops in addition to 
> dma_ops.

In the general case, no - some platforms (including x86-64 on IBM's
high end servers!) have an isolation capable IOMMU, which means all
DMA mappings need to go through it, so a general mechanism to cope
with DMA mappings running out is still needed.

Cheers,
Muli
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: network devices don't handle pci_dma_mapping_error()'s

2006-12-06 Thread Muli Ben-Yehuda
On Wed, Dec 06, 2006 at 10:16:44AM -0800, Stephen Hemminger wrote:

> I think it is really only an issue for drivers that turn on HIGH_DMA
> and have limited mask values. The majority of drivers either only
> handle 32 bit (!HIGH_DMA) or do full 64 bit mapping. I don't know
> the details of how we manage IOMMU, but doesn't mapping always work
> for those drivers.

It's up to an IOMMU (DMA-API) implementation to define what
constitutes a mapping error, e.g., Calgary and GART on x86-64 will
return bad_dma_address from the mapping functions when they run out of
entries in the IO space, which can happen regardless of the mask.

Cheers,
Muli
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: network devices don't handle pci_dma_mapping_error()'s

2006-12-04 Thread Muli Ben-Yehuda
On Mon, Dec 04, 2006 at 10:39:49AM -0800, Stephen Hemminger wrote:

> I notice that no current network driver handles dma mapping errors.
> Might that be part of the problem.  On i386, this never happens, and
> it would be rare on most others.

IOMMUs are already available on x86-64 and are going to get widespread
with the the introduction of IOMMUs from Intel and AMD. Might as well
fix it now...

How about CONFIG_DEBUG_DMA_API that does book-keeping and yells if a
driver is mis-using the DMA API?

Cheers,
Muli
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH,RESEND] smc91x: disable DMA mode on the logicpd pxa270

2006-08-12 Thread Muli Ben-Yehuda
On Sat, Aug 12, 2006 at 11:30:35AM +0200, Lennert Buytenhek wrote:

> Nope.  Unfortunately, the line that made that clear didn't fall inside
> the patch context:

Ah, I see. Thanks.

Cheers,
Muli
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH,RESEND] smc91x: disable DMA mode on the logicpd pxa270

2006-08-12 Thread Muli Ben-Yehuda
On Sat, Aug 12, 2006 at 10:55:07AM +0200, Lennert Buytenhek wrote:
> Enabling PXA DMA for the smc91x on the logicpd pxa270 produces
> unacceptable interference with the TFT panel, so disable it.

Doesn't this disable it for all users, though? are there any other
users? should this be changed from a #define to a run-time toggle?

Cheers,
Muli
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] ehea: queue managment

2006-06-07 Thread Muli Ben-Yehuda
On Wed, Jun 07, 2006 at 10:29:50AM -0700, Stephen Hemminger wrote:

> Be careful about global namespace issues. Stick to one prefix like ehea_
> for all non static function names. Consider putting all in one file, or
> use #include to cause it to be one compilation unit.

I thought including .c files was discouraged (if not, it should be...)

Cheers,
Muli
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html