Re: [PATCH] Virtual ethernet tunnel (v.2)
Ben Greear wrote: > Pavel Emelianov wrote: > >>> I would also like some way to identify veth from other device types, >>> preferably >>> something like a value in sysfs. However, that should not hold up >>> >> >> >> We can do this with ethtool. It can get and print the driver name of >> the device. >> > > I think I'd like something in sysfs that we could query for any > interface. Possible return > strings could be: > VLAN > VETH > ETH > PPP > BRIDGE > AP /* wifi access point interface */ > STA /* wifi station */ > > > I will cook up a patch for consideration after veth goes in. The rtnl_link API gives you the name of the driver (IFLA_INFO_KIND). - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Virtual ethernet tunnel (v.2)
Carl-Daniel Hailfinger wrote: On 08.06.2007 19:00, Ben Greear wrote: I have another sysfs patch that allows setting a default skb->mark for an interface so that you can set the skb->mark before it hits the connection tracking logic, but I'm been told this one has very little chance of getting into the kernel. The skb->mark patch is only useful (as far as I can tell) if you also include a patch Patrick McHardy did for me that allowed the conn-tracking logic to use skb->mark as part of it's tuple. This allows me to do NAT between virtual routers (routing tables) on the same machine using veth-equivalent drivers to connect the routers. He thinks this will probably not ever get into the kernel either. Are these patches available somewhere? I'm currently doing NAT between virtual routers by some advanced iproute2/iptables trickery, but I have no way to handle the occasional tuple conflict. A consolidated patch against 2.6.20.12 is here. It has a lot more than just the patches mentioned above, but it shouldn't hurt anything to have the whole patch applied: http://www.candelatech.com/oss/candela_2.6.20.patch The original patch for using skb->mark as a tuple was written by Patrick McHardy, and is here: http://www.candelatech.com/oss/skb_mark_conntrack.patch His patch merged with my patch to sysfs to set skb->mark on ingress is here: http://www.candelatech.com/oss/conntrack_mark_with_ssyctl.patch Thanks, Ben Regards, Carl-Daniel -- Ben Greear <[EMAIL PROTECTED]> Candela Technologies Inc http://www.candelatech.com - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Virtual ethernet tunnel (v.2)
On 08.06.2007 19:00, Ben Greear wrote: > I have another sysfs patch that allows setting a default skb->mark for > an interface so that you can set the skb->mark > before it hits the connection tracking logic, but I'm been told this one > has very little chance > of getting into the kernel. The skb->mark patch is only useful (as far > as I can tell) if you > also include a patch Patrick McHardy did for me that allowed the > conn-tracking logic to > use skb->mark as part of it's tuple. This allows me to do NAT between > virtual routers > (routing tables) on the same machine using veth-equivalent drivers to > connect the > routers. He thinks this will probably not ever get into the kernel either. Are these patches available somewhere? I'm currently doing NAT between virtual routers by some advanced iproute2/iptables trickery, but I have no way to handle the occasional tuple conflict. Regards, Carl-Daniel - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Virtual ethernet tunnel (v.2)
Pavel Emelianov wrote: Ben Greear wrote: [snip] I would also like some way to identify veth from other device types, preferably something like a value in sysfs. However, that should not hold up We can do this with ethtool. It can get and print the driver name of the device. I think I'd like something in sysfs that we could query for any interface. Possible return strings could be: VLAN VETH ETH PPP BRIDGE AP /* wifi access point interface */ STA /* wifi station */ I will cook up a patch for consideration after veth goes in. Ben, could you please tell what sysfs features do you plan to implement? I think this is the only thing that has a chance of getting into the kernel. Basically, I have a user-space app and I want to be able to definitively know the type for all interfaces. Currently, I have a hodge-podge of logic to query various ioctls and /proc files and finally, guess by name if nothing else works. There must be a better way :P I have another sysfs patch that allows setting a default skb->mark for an interface so that you can set the skb->mark before it hits the connection tracking logic, but I'm been told this one has very little chance of getting into the kernel. The skb->mark patch is only useful (as far as I can tell) if you also include a patch Patrick McHardy did for me that allowed the conn-tracking logic to use skb->mark as part of it's tuple. This allows me to do NAT between virtual routers (routing tables) on the same machine using veth-equivalent drivers to connect the routers. He thinks this will probably not ever get into the kernel either. I have another sysctl related send-to-self patch that also has little chance of getting into the kernel, but it might be quite useful with veth (it's useful to me..but my needs aren't exactly mainstream :)) I'll post this separately for consideration Thanks, Ben -- Ben Greear <[EMAIL PROTECTED]> Candela Technologies Inc http://www.candelatech.com - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Virtual ethernet tunnel (v.2)
Ben Greear wrote: [snip] >>> I would also like some way to identify veth from other device types, >>> preferably >>> something like a value in sysfs. However, that should not hold up >>> >> >> We can do this with ethtool. It can get and print the driver name of >> the device. >> > I think I'd like something in sysfs that we could query for any > interface. Possible return > strings could be: > VLAN > VETH > ETH > PPP > BRIDGE > AP /* wifi access point interface */ > STA /* wifi station */ > > > I will cook up a patch for consideration after veth goes in. > Ben, could you please tell what sysfs features do you plan to implement? Thanks, Pavel - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Virtual ethernet tunnel (v.2)
Pavel Emelianov wrote: Hmm... The loopback must be doing bad things then. It first calls eth_type_trans and then accounts for the new skb->len. Perhaps it should be changed. e100 calculates the entire frame as far as I can tell, and e1000 and tg3 do it in hardware (not sure what all they are counting, but I *think* it includes the header...) VLANs calculate before pulling it's header, though the ethernet header has already been pulled by the time VLAN sees the skb. Thanks, Ben -- Ben Greear <[EMAIL PROTECTED]> Candela Technologies Inc http://www.candelatech.com - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Virtual ethernet tunnel (v.2)
Ben Greear wrote: > Pavel Emelianov wrote: >> Ben Greear wrote: >> >>> Pavel Emelianov wrote: >>> Veth stands for Virtual ETHernet. It is a simple tunnel driver that works at the link layer and looks like a pair of ethernet devices interconnected with each other. >>> As Dave mentioned, there is already a driver known as 'veth'. Maybe >>> borrow >>> the etun name as well? >>> >> >> We have already seen that this driver uses ethXXX names for >> its devices and Dave agreed with veth one. Moreover Alexey >> Kuznetsov said that he would prefer the name veth for etun. >> > Ok, fine by me. I started reading mail from the wrong direction this > morning :) >> >>> I would also like some way to identify veth from other device types, >>> preferably >>> something like a value in sysfs. However, that should not hold up >>> >> >> We can do this with ethtool. It can get and print the driver name of >> the device. >> > I think I'd like something in sysfs that we could query for any > interface. Possible return > strings could be: > VLAN > VETH > ETH > PPP > BRIDGE > AP /* wifi access point interface */ > STA /* wifi station */ > > > I will cook up a patch for consideration after veth goes in. OK. >>> I think you need at least the option to zero out the time-stamp, >>> otherwise it will >>> not be re-calculated when received on the peer, and it potentially spent >>> significant >>> time since it was last calculated (think netem delay or similar). >>> >>> +/* Zero out the time-stamp so that receiving code is forced >>> + * to recalculate it. >>> + */ >>> +skb->tstamp.off_sec = 0; >>> +skb->tstamp.off_usec = 0; >>> >>> + +rcv_priv = netdev_priv(rcv); +skb->pkt_type = PACKET_HOST; +skb->protocol = eth_type_trans(skb, rcv); +if (dev->features & NETIF_F_NO_CSUM) +skb->ip_summed = rcv_priv->ip_summed; + +dst_release(skb->dst); +skb->dst = NULL; +secpath_reset(skb); +nf_reset(skb); +skb->mark = 0; + +length = skb->len; >>> This should be done before you do the eth_type_trans, as that pulls the >>> header and your >>> byte counters will be off. >>> >> >> This will be ETH_HLEN larger, do you mean this? I think this is >> normal as this device tries to look like an "iron" ethernet card :) >> > For device counters, it should count the number of bytes received, > including all headers, > but excluding the ethernet FCS. If an 'iron' card did differently, I'd > consider it a bug. Hmm... The loopback must be doing bad things then. It first calls eth_type_trans and then accounts for the new skb->len. > Thanks, > Ben > - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Virtual ethernet tunnel (v.2)
Pavel Emelianov wrote: Ben Greear wrote: Pavel Emelianov wrote: Veth stands for Virtual ETHernet. It is a simple tunnel driver that works at the link layer and looks like a pair of ethernet devices interconnected with each other. As Dave mentioned, there is already a driver known as 'veth'. Maybe borrow the etun name as well? We have already seen that this driver uses ethXXX names for its devices and Dave agreed with veth one. Moreover Alexey Kuznetsov said that he would prefer the name veth for etun. Ok, fine by me. I started reading mail from the wrong direction this morning :) I would also like some way to identify veth from other device types, preferably something like a value in sysfs. However, that should not hold up We can do this with ethtool. It can get and print the driver name of the device. I think I'd like something in sysfs that we could query for any interface. Possible return strings could be: VLAN VETH ETH PPP BRIDGE AP /* wifi access point interface */ STA /* wifi station */ I will cook up a patch for consideration after veth goes in. I think you need at least the option to zero out the time-stamp, otherwise it will not be re-calculated when received on the peer, and it potentially spent significant time since it was last calculated (think netem delay or similar). +/* Zero out the time-stamp so that receiving code is forced + * to recalculate it. + */ +skb->tstamp.off_sec = 0; +skb->tstamp.off_usec = 0; + +rcv_priv = netdev_priv(rcv); +skb->pkt_type = PACKET_HOST; +skb->protocol = eth_type_trans(skb, rcv); +if (dev->features & NETIF_F_NO_CSUM) +skb->ip_summed = rcv_priv->ip_summed; + +dst_release(skb->dst); +skb->dst = NULL; +secpath_reset(skb); +nf_reset(skb); +skb->mark = 0; + +length = skb->len; This should be done before you do the eth_type_trans, as that pulls the header and your byte counters will be off. This will be ETH_HLEN larger, do you mean this? I think this is normal as this device tries to look like an "iron" ethernet card :) For device counters, it should count the number of bytes received, including all headers, but excluding the ethernet FCS. If an 'iron' card did differently, I'd consider it a bug. Thanks, Ben -- Ben Greear <[EMAIL PROTECTED]> Candela Technologies Inc http://www.candelatech.com - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Virtual ethernet tunnel (v.2)
Ben Greear wrote: > Pavel Emelianov wrote: >> Veth stands for Virtual ETHernet. It is a simple tunnel driver >> that works at the link layer and looks like a pair of ethernet >> devices interconnected with each other. >> > As Dave mentioned, there is already a driver known as 'veth'. Maybe borrow > the etun name as well? We have already seen that this driver uses ethXXX names for its devices and Dave agreed with veth one. Moreover Alexey Kuznetsov said that he would prefer the name veth for etun. > I would also like some way to identify veth from other device types, > preferably > something like a value in sysfs. However, that should not hold up We can do this with ethtool. It can get and print the driver name of the device. > consideration of > this patch, and I am willing to submit a patch after this goes in to add > the functionality > I want... Ok. Thanks. >> +/* >> + * xmit >> + */ >> + >> +static int veth_xmit(struct sk_buff *skb, struct net_device *dev) >> +{ >> +struct net_device *rcv = NULL; >> +struct veth_device_stats *stats; >> +struct veth_priv *priv, *rcv_priv; >> +int length, cpu; >> + >> +skb_orphan(skb); >> + >> +priv = netdev_priv(dev); >> +cpu = smp_processor_id(); >> +stats = per_cpu_ptr(priv->stats, cpu); >> +rcv = priv->peer; >> + >> +if (!(rcv->flags & IFF_UP)) >> +goto outf; >> > I think you need at least the option to zero out the time-stamp, > otherwise it will > not be re-calculated when received on the peer, and it potentially spent > significant > time since it was last calculated (think netem delay or similar). > > +/* Zero out the time-stamp so that receiving code is forced > + * to recalculate it. > + */ > +skb->tstamp.off_sec = 0; > +skb->tstamp.off_usec = 0; > >> + >> +rcv_priv = netdev_priv(rcv); >> +skb->pkt_type = PACKET_HOST; >> +skb->protocol = eth_type_trans(skb, rcv); >> +if (dev->features & NETIF_F_NO_CSUM) >> +skb->ip_summed = rcv_priv->ip_summed; >> + >> +dst_release(skb->dst); >> +skb->dst = NULL; >> +secpath_reset(skb); >> +nf_reset(skb); >> +skb->mark = 0; >> + >> +length = skb->len; >> > This should be done before you do the eth_type_trans, as that pulls the > header and your > byte counters will be off. This will be ETH_HLEN larger, do you mean this? I think this is normal as this device tries to look like an "iron" ethernet card :) >> + >> +stats->tx_bytes += length; >> +stats->tx_packets++; >> + >> +stats = per_cpu_ptr(rcv_priv->stats, cpu); >> +stats->rx_bytes += length; >> +stats->rx_packets++; >> + >> +netif_rx(skb); >> +return 0; >> + >> +outf: >> +kfree_skb(skb); >> +stats->tx_dropped++; >> +return 0; >> +} >> > Thanks, > Ben > - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Virtual ethernet tunnel (v.2)
Pavel Emelianov wrote: Veth stands for Virtual ETHernet. It is a simple tunnel driver that works at the link layer and looks like a pair of ethernet devices interconnected with each other. As Dave mentioned, there is already a driver known as 'veth'. Maybe borrow the etun name as well? I would also like some way to identify veth from other device types, preferably something like a value in sysfs. However, that should not hold up consideration of this patch, and I am willing to submit a patch after this goes in to add the functionality I want... +/* + * xmit + */ + +static int veth_xmit(struct sk_buff *skb, struct net_device *dev) +{ + struct net_device *rcv = NULL; + struct veth_device_stats *stats; + struct veth_priv *priv, *rcv_priv; + int length, cpu; + + skb_orphan(skb); + + priv = netdev_priv(dev); + cpu = smp_processor_id(); + stats = per_cpu_ptr(priv->stats, cpu); + rcv = priv->peer; + + if (!(rcv->flags & IFF_UP)) + goto outf; I think you need at least the option to zero out the time-stamp, otherwise it will not be re-calculated when received on the peer, and it potentially spent significant time since it was last calculated (think netem delay or similar). +/* Zero out the time-stamp so that receiving code is forced + * to recalculate it. + */ +skb->tstamp.off_sec = 0; +skb->tstamp.off_usec = 0; + + rcv_priv = netdev_priv(rcv); + skb->pkt_type = PACKET_HOST; + skb->protocol = eth_type_trans(skb, rcv); + if (dev->features & NETIF_F_NO_CSUM) + skb->ip_summed = rcv_priv->ip_summed; + + dst_release(skb->dst); + skb->dst = NULL; + secpath_reset(skb); + nf_reset(skb); + skb->mark = 0; + + length = skb->len; This should be done before you do the eth_type_trans, as that pulls the header and your byte counters will be off. + + stats->tx_bytes += length; + stats->tx_packets++; + + stats = per_cpu_ptr(rcv_priv->stats, cpu); + stats->rx_bytes += length; + stats->rx_packets++; + + netif_rx(skb); + return 0; + +outf: + kfree_skb(skb); + stats->tx_dropped++; + return 0; +} Thanks, Ben -- Ben Greear <[EMAIL PROTECTED]> Candela Technologies Inc http://www.candelatech.com - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Virtual ethernet tunnel (v.2)
Veth stands for Virtual ETHernet. It is a simple tunnel driver that works at the link layer and looks like a pair of ethernet devices interconnected with each other. Mainly it allows to communicate between network namespaces but it can be used as is as well. Eric recently sent a similar driver called etun. This implementation uses another interface - the RTM_NRELINK message introduced by Patric. The newlink callback is organized that way to make it easy to create the peer device in the separate namespace when we have them in kernel. Changes from v.1: * percpu statistics; * standard convention for nla policy names; * module alias added; * xmit function fixes noticed by Patric; * code cleanup. The patch for an ip utility is also provided. Signed-off-by: Pavel Emelianov <[EMAIL PROTECTED]> Since ethtool interface was taken from Eric's patch, I think that he would like to see his Signed-off line as well (however he didn't answer yesterday). --- diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig index 7d57f4a..7e144be 100644 --- a/drivers/net/Kconfig +++ b/drivers/net/Kconfig @@ -119,6 +119,12 @@ config TUN If you don't know what to use this for, you don't need it. +config VETH + tristate "Virtual ethernet device" + ---help--- + The device is an ethernet tunnel. Devices are created in pairs. When + one end receives the packet it appears on its pair and vice versa. + config NET_SB1000 tristate "General Instruments Surfboard 1000" depends on PNP diff --git a/drivers/net/Makefile b/drivers/net/Makefile index a77affa..4764119 100644 --- a/drivers/net/Makefile +++ b/drivers/net/Makefile @@ -185,6 +185,7 @@ obj-$(CONFIG_MACSONIC) += macsonic.o obj-$(CONFIG_MACMACE) += macmace.o obj-$(CONFIG_MAC89x0) += mac89x0.o obj-$(CONFIG_TUN) += tun.o +obj-$(CONFIG_VETH) += veth.o obj-$(CONFIG_NET_NETX) += netx-eth.o obj-$(CONFIG_DL2K) += dl2k.o obj-$(CONFIG_R8169) += r8169.o diff --git a/drivers/net/veth.c b/drivers/net/veth.c new file mode 100644 index 000..e7ad43d --- /dev/null +++ b/drivers/net/veth.c @@ -0,0 +1,442 @@ +/* + * drivers/net/veth.c + * + * Copyright (C) 2007 OpenVZ http://openvz.org, SWsoft Inc + * + * Author: Pavel Emelianov <[EMAIL PROTECTED]> + * + */ + +#include +#include +#include +#include + +#include +#include +#include + +#define DRV_NAME "veth" +#define DRV_VERSION"1.0" + +struct veth_device_stats { + unsigned long rx_packets; + unsigned long tx_packets; + unsigned long rx_bytes; + unsigned long tx_bytes; + unsigned long tx_dropped; +}; + +struct veth_priv { + struct net_device *peer; + struct net_device *dev; + struct list_head list; + struct veth_device_stats *stats; + unsigned ip_summed; +}; + +static LIST_HEAD(veth_list); + +/* + * ethtool interface + */ + +static struct { + const char string[ETH_GSTRING_LEN]; +} ethtool_stats_keys[] = { + { "peer_ifindex" }, +}; + +static int veth_get_settings(struct net_device *dev, struct ethtool_cmd *cmd) +{ + cmd->supported = 0; + cmd->advertising= 0; + cmd->speed = SPEED_1; + cmd->duplex = DUPLEX_FULL; + cmd->port = PORT_TP; + cmd->phy_address= 0; + cmd->transceiver= XCVR_INTERNAL; + cmd->autoneg= AUTONEG_DISABLE; + cmd->maxtxpkt = 0; + cmd->maxrxpkt = 0; + return 0; +} + +static void veth_get_drvinfo(struct net_device *dev, struct ethtool_drvinfo *info) +{ + strcpy(info->driver, DRV_NAME); + strcpy(info->version, DRV_VERSION); + strcpy(info->fw_version, "N/A"); +} + +static void veth_get_strings(struct net_device *dev, u32 stringset, u8 *buf) +{ + switch(stringset) { + case ETH_SS_STATS: + memcpy(buf, ðtool_stats_keys, sizeof(ethtool_stats_keys)); + break; + } +} + +static int veth_get_stats_count(struct net_device *dev) +{ + return ARRAY_SIZE(ethtool_stats_keys); +} + +static void veth_get_ethtool_stats(struct net_device *dev, + struct ethtool_stats *stats, u64 *data) +{ + struct veth_priv *priv; + + priv = netdev_priv(dev); + data[0] = priv->peer->ifindex; +} + +static u32 veth_get_rx_csum(struct net_device *dev) +{ + struct veth_priv *priv; + + priv = netdev_priv(dev); + return priv->ip_summed == CHECKSUM_UNNECESSARY; +} + +static int veth_set_rx_csum(struct net_device *dev, u32 data) +{ + struct veth_priv *priv; + + priv = netdev_priv(dev); + priv->ip_summed = data ? CHECKSUM_UNNECESSARY : CHECKSUM_NONE; + return 0; +} + +static u32 veth_get_tx_csum(struct net_device *dev) +{ + return (dev->features & NETIF_F_NO_CSUM) != 0; +} + +static int veth_set_tx_csum(struct net_device *dev, u32 data) +{ + if (data) + dev->features |= NETI