Re: [RFC RTNETLINK 00/09]: Netlink link creation API
Alexey Kuznetsov [EMAIL PROTECTED] writes: Hello! Good point, I didn't think of that. Is there a version of this patch that already uses different namespaces so I can look at it? Pavel does not like the idea. It looks not exactly pretty, like you said. :-) The alternative is to create pair in main namespace and then move one end to another namespace renaming it and changing index. Why I do not like it? Because this makes RTM_NEWLINK just useless step, all its work is undone and real work is remade when the device moves, with all the unrettiness moved to another place. - A move network device between namespaces operation is necessary. - If we limit these devices to just communication between namespaces we severely limit their utility. In particular there are know applications now that do not need this. - Further I believe by using RTM_NEWLINK the ethernet tunnel driver will never need to have any code that knows about namespaces, all that is needed is for RTM_NEWLINK to have an appropriate default network namespace, (the network namespace of the netlink socket). From another hand, some move operation is required in any case. Right now in openvz the problem is solved in tricky, but quite inerseting way: all the devices in main namespace are assigned with odd index, child devices get odd index. So that, when a device moves from main namespace to child, openvz does not need to change ifindex, conflict is impossible. Well, it is working approach. But it is not pretty either. We can solve the ifindex change even more simply by simply using a global ifindex sequence number for now. In the context of migration that is likely to prove insufficient for virtual devices but for now it is simple, it already exists and it is good enough. Are network namespace completely seperated or is there some hierarchy with all lower namespaces visible above or something like that? Right now they are completely separate. It is possible to make child devices visible in parent namespace like it is done for process pids: i.e. there is an abstract identity which is seen under different names and indices in different namespaces. Sounds cool, but this add a lot of complexity, which has no meaning outside of context of device creation, I do not think it is worth to do. I completely agree. There is no advantage and a considerable disadvantage in having network namespaces being other then completely separate. The identity of the main device has no meaning within a different namespace, but are there other reasons for hiding it? Perhaps, security. It is not a good idea to leak information about parent namespace to child namespace. Also, people will want to see emulated ethernet inside namespace looking exactly like ethernet. No freaking additional attributes. As long as we keep ourselves within the usual variation of ethernet network devices we should be fine. For someone who wants to know we can't hide the fact we are a particular kind of ethernet device. Eric - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC RTNETLINK 00/09]: Netlink link creation API
Eric W. Biederman wrote: Reading through the patches they look usable to me. Having to patch iproute to create the more interesting network devices sucks, but that problem seems fundamental. We might be able to avoid it if we allowed fields to be reused between different types of devices but that makes the error checking trickier, and we aren't likely to have that many types of devices so there likely isn't much value in generalizing. You don't really need to patch it, installing a new iplink_XXX.so file is enough. Generalizing driver specific options more than what we currently have doesn't look very promising. I think your driver was simple enough to get along with the generic device attributes though (IFLA_LINK or IFLA_MASTER). I do think we should specify the IFLA_KIND (was: IFLA_NAME) values in a header file. So it is easy to get a list of all of the different kinds and so we don't conflict. I don't think conflicts are going to be a problem (it would be nice if modpost would warn about duplicate aliases though). How is listing IFLA_KIND types in a header file going to help get a list? Presuming the user knows what kind of device he wants to set up and is not just looking for things to play around with I also don't see any real value in this. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC RTNETLINK 00/09]: Netlink link creation API
Patrick McHardy wrote: The following patches contain the rtnetlink link creation API I promised, as well as two simple driver conversion to use the API as an example. I've also converted VLAN as a more complex example, but these patches need some more work and are most likely not interesting to all the CCed parties, so I'm sending them seperately. I've updated the patches to remove the broken VLAN ID change, added back some consts, renamed IFLA_INFO_NAME to IFLA_INFO_KIND and rebased to current net-2.6. The current patches and -git trees can be found at http://people.netfilter.org/kaber/rtnl_link/ - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC RTNETLINK 00/09]: Netlink link creation API
Patrick McHardy [EMAIL PROTECTED] writes: Eric W. Biederman wrote: Reading through the patches they look usable to me. Having to patch iproute to create the more interesting network devices sucks, but that problem seems fundamental. We might be able to avoid it if we allowed fields to be reused between different types of devices but that makes the error checking trickier, and we aren't likely to have that many types of devices so there likely isn't much value in generalizing. You don't really need to patch it, installing a new iplink_XXX.so file is enough. Generalizing driver specific options more than what we currently have doesn't look very promising. I think your driver was simple enough to get along with the generic device attributes though (IFLA_LINK or IFLA_MASTER). I need to know the other device in the pair of devices I am creating. If ifindex was selectable IFLA_LINK or IFLA_MASTER might be interesting however they are currently are not, and I'm not quite certain about letting a user select the ifindex. Although there may come a point when dealing with migration when it makes sense. Hmm. I guess if I had a reasonable default I could find out the pair device by looking at the returned NEW_LINK message. Looking more close. IFLA_MASTER does not work because I don't have a master/slave relationship. IFLA_LINK looks like it will work. I don't precisely match the semantics though which makes me nervous. In particular my other device is not something I am sending through but what I am sending to. The way the IPv6 code uses iflink to get the link local address starting with the hardware address of the iflink would be completely the wrong thing to do in my case. Now my device won't have the magic IPv6 tunnel arp type so that code won't trigger. Still it is a challenge. I still think adding a IFLA_PARTNER or a custom attribute is cleaner in this case. Slight semantic mismatches are the worst design bugs to correct. To some extent this is a practical problematic point for me, because in the context of multiple network namespaces I could theoretically have both network devices have the same name and the same ifindex in different network namespaces. Although it really doesn't matter unless they are in the same network namespace in which case they can't have the same ifindex or ifname. I do think we should specify the IFLA_KIND (was: IFLA_NAME) values in a header file. So it is easy to get a list of all of the different kinds and so we don't conflict. I don't think conflicts are going to be a problem (it would be nice if modpost would warn about duplicate aliases though). How is listing IFLA_KIND types in a header file going to help get a list? Presuming the user knows what kind of device he wants to set up and is not just looking for things to play around with I also don't see any real value in this. This isn't about the user this is about maintaining the ABI. We have to control set of strings for IFLA_KIND. Having them all in a single header file means that we can easily look when we add support for a new kind to see if some other driver has already used that kind. The same reason we stick the rest of the enumerations into a header file. Strings don't conflict as easily as small integers do, but it is still possible to have a conflict, so having something like an ifla_kind.h to hold them would be useful. Eric - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC RTNETLINK 00/09]: Netlink link creation API
Eric W. Biederman wrote: Patrick McHardy [EMAIL PROTECTED] writes: You don't really need to patch it, installing a new iplink_XXX.so file is enough. Generalizing driver specific options more than what we currently have doesn't look very promising. I think your driver was simple enough to get along with the generic device attributes though (IFLA_LINK or IFLA_MASTER). I need to know the other device in the pair of devices I am creating. If ifindex was selectable IFLA_LINK or IFLA_MASTER might be interesting however they are currently are not, and I'm not quite certain about letting a user select the ifindex. Although there may come a point when dealing with migration when it makes sense. It shouldn't be very hard to implement, so far I just didn't see any use for it. Hmm. I guess if I had a reasonable default I could find out the pair device by looking at the returned NEW_LINK message. Looking more close. IFLA_MASTER does not work because I don't have a master/slave relationship. IFLA_LINK looks like it will work. I don't precisely match the semantics though which makes me nervous. In particular my other device is not something I am sending through but what I am sending to. The way the IPv6 code uses iflink to get the link local address starting with the hardware address of the iflink would be completely the wrong thing to do in my case. Now my device won't have the magic IPv6 tunnel arp type so that code won't trigger. Still it is a challenge. I still think adding a IFLA_PARTNER or a custom attribute is cleaner in this case. Slight semantic mismatches are the worst design bugs to correct. Indeed, IFLA_PARTNER sounds like a better idea. I just suggested to Pavel to create only a single device per newlink operation and binding them later, what do you think about that? I do think we should specify the IFLA_KIND (was: IFLA_NAME) values in a header file. So it is easy to get a list of all of the different kinds and so we don't conflict. I don't think conflicts are going to be a problem (it would be nice if modpost would warn about duplicate aliases though). How is listing IFLA_KIND types in a header file going to help get a list? Presuming the user knows what kind of device he wants to set up and is not just looking for things to play around with I also don't see any real value in this. This isn't about the user this is about maintaining the ABI. We have to control set of strings for IFLA_KIND. Having them all in a single header file means that we can easily look when we add support for a new kind to see if some other driver has already used that kind. The same reason we stick the rest of the enumerations into a header file. Strings don't conflict as easily as small integers do, but it is still possible to have a conflict, so having something like an ifla_kind.h to hold them would be useful. Mhh .. we have multiple string based APIs that do just fine. I'd prefer having someone adding a new driver do a quick grep for MODULE_ALIAS_RTNL_LINK to adding unused definitions. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC RTNETLINK 00/09]: Netlink link creation API
Hello! I just suggested to Pavel to create only a single device per newlink operation and binding them later, I see some logical inconsistency here. Look, the second end is supposed to be in another namespace. It will have identity, which cannot be expressed in any way in namespace, which is allowed to create the pair: name, ifindex - nothing is shared between namespaces. Moreover, do not forget we have two netlink spaces as well. Events happening in one namespace are reported only inside that namespace. Actually, the only self-consistent solution, which I see right now (sorry, did not think that much) is to create the whole pair as one operation; required parameters (name of partner, identity of namespace) can be passed as attributes. I guess IFLA_PARTNER approach suggests the same thing, right? As response to this action two replies are generated: one RTM_NEWLINK for one end of device with the whole desciption of partnet is broadcasted inside this namespace, another RTM_NETLINK with index/name of partner device is broadcasted inside the second namespace (and, probably, some attributes, which must be hidden inside namespace, f.e. identity of main device is suppressed). Alexey - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC RTNETLINK 00/09]: Netlink link creation API
Patrick McHardy [EMAIL PROTECTED] writes: I still think adding a IFLA_PARTNER or a custom attribute is cleaner in this case. Slight semantic mismatches are the worst design bugs to correct. Indeed, IFLA_PARTNER sounds like a better idea. I just suggested to Pavel to create only a single device per newlink operation and binding them later, what do you think about that? I don't think it solves much because we still need a way to report the partner device. On the actual using side I think it makes the core of the driver much more difficult to get right. Basically if we can't count on having a partner device we have to add NULL pointer checks and locking to the packet dispatch which is otherwise unnecessary. Eric - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC RTNETLINK 00/09]: Netlink link creation API
Eric W. Biederman wrote: Patrick McHardy [EMAIL PROTECTED] writes: I still think adding a IFLA_PARTNER or a custom attribute is cleaner in this case. Slight semantic mismatches are the worst design bugs to correct. Indeed, IFLA_PARTNER sounds like a better idea. I just suggested to Pavel to create only a single device per newlink operation and binding them later, what do you think about that? I don't think it solves much because we still need a way to report the partner device. I was thinking of something like this: ip link add veth0 type veth ip link add veth1 partner veth0 type veth ip would resolve veth0 to an ifindex and set IFLA_PARTNER. But Alexey just raised a few good points, so this might not work. On the actual using side I think it makes the core of the driver much more difficult to get right. Basically if we can't count on having a partner device we have to add NULL pointer checks and locking to the packet dispatch which is otherwise unnecessary. All you'd need to do is keep the queue stopped until the device is bound. No changes to rx or tx path neccessary. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC RTNETLINK 00/09]: Netlink link creation API
Alexey Kuznetsov wrote: I just suggested to Pavel to create only a single device per newlink operation and binding them later, I see some logical inconsistency here. Look, the second end is supposed to be in another namespace. It will have identity, which cannot be expressed in any way in namespace, which is allowed to create the pair: name, ifindex - nothing is shared between namespaces. Good point, I didn't think of that. Is there a version of this patch that already uses different namespaces so I can look at it? Are network namespace completely seperated or is there some hierarchy with all lower namespaces visible above or something like that? Moreover, do not forget we have two netlink spaces as well. Events happening in one namespace are reported only inside that namespace. Actually, the only self-consistent solution, which I see right now (sorry, did not think that much) is to create the whole pair as one operation; required parameters (name of partner, identity of namespace) can be passed as attributes. I guess IFLA_PARTNER approach suggests the same thing, right? I imagined it more as a bind operation, pretty similar to enslave, so it would only contain an ifindex, no parameters. But as you say that doesn't work, so I guess we'd have to nest an entire ifinfomsg + the attributes for the partner device under it .. not exactly pretty. As response to this action two replies are generated: one RTM_NEWLINK for one end of device with the whole desciption of partnet is broadcasted inside this namespace, another RTM_NETLINK with index/name of partner device is broadcasted inside the second namespace (and, probably, some attributes, which must be hidden inside namespace, f.e. identity of main device is suppressed). The identity of the main device has no meaning within a different namespace, but are there other reasons for hiding it? - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC RTNETLINK 00/09]: Netlink link creation API
Patrick McHardy [EMAIL PROTECTED] writes: Alexey Kuznetsov wrote: I just suggested to Pavel to create only a single device per newlink operation and binding them later, I see some logical inconsistency here. Look, the second end is supposed to be in another namespace. It will have identity, which cannot be expressed in any way in namespace, which is allowed to create the pair: name, ifindex - nothing is shared between namespaces. Good point, I didn't think of that. Is there a version of this patch that already uses different namespaces so I can look at it? We have posted patches a couple of times, and veth or etun were always a part of it. But except for some book keeping details we really don't care. Are network namespace completely seperated or is there some hierarchy with all lower namespaces visible above or something like that? Completely separated. The goal is to look like two separate machines to user space, with respect to the network stack. There is a bit of a hierarchy usage wise. Because frequently only one namespace will have real hardware devices in it. So everything needs to route through there. But that detail is a usage detail and is easiest not to reflect in the actual implementation. I imagined it more as a bind operation, pretty similar to enslave, so it would only contain an ifindex, no parameters. But as you say that doesn't work, so I guess we'd have to nest an entire ifinfomsg + the attributes for the partner device under it .. not exactly pretty. In the model I'm working in, is that there is a separate operation: move device to other namespace, which should work for any network device. So there should be an interval immediately after device creation when both devices are in the same namespace, and then one of the pair is moved to another namespace. As response to this action two replies are generated: one RTM_NEWLINK for one end of device with the whole desciption of partnet is broadcasted inside this namespace, another RTM_NETLINK with index/name of partner device is broadcasted inside the second namespace (and, probably, some attributes, which must be hidden inside namespace, f.e. identity of main device is suppressed). The identity of the main device has no meaning within a different namespace, but are there other reasons for hiding it? Not really. We can already recognized the type of the device. Eric - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC RTNETLINK 00/09]: Netlink link creation API
Hello! Good point, I didn't think of that. Is there a version of this patch that already uses different namespaces so I can look at it? Pavel does not like the idea. It looks not exactly pretty, like you said. :-) The alternative is to create pair in main namespace and then move one end to another namespace renaming it and changing index. Why I do not like it? Because this makes RTM_NEWLINK just useless step, all its work is undone and real work is remade when the device moves, with all the unrettiness moved to another place. From another hand, some move operation is required in any case. Right now in openvz the problem is solved in tricky, but quite inerseting way: all the devices in main namespace are assigned with odd index, child devices get odd index. So that, when a device moves from main namespace to child, openvz does not need to change ifindex, conflict is impossible. Well, it is working approach. But it is not pretty either. Are network namespace completely seperated or is there some hierarchy with all lower namespaces visible above or something like that? Right now they are completely separate. It is possible to make child devices visible in parent namespace like it is done for process pids: i.e. there is an abstract identity which is seen under different names and indices in different namespaces. Sounds cool, but this add a lot of complexity, which has no meaning outside of context of device creation, I do not think it is worth to do. The identity of the main device has no meaning within a different namespace, but are there other reasons for hiding it? Perhaps, security. It is not a good idea to leak information about parent namespace to child namespace. Also, people will want to see emulated ethernet inside namespace looking exactly like ethernet. No freaking additional attributes. Alexey - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC RTNETLINK 00/09]: Netlink link creation API
From: Patrick McHardy [EMAIL PROTECTED] Date: Tue, 5 Jun 2007 16:12:51 +0200 (MEST) A few words about the API: Drivers wishing to use the API register a struct rtnl_link_ops, which contains a few function pointers for device setup, registation, changing and deletion, as well as netlink attribute validation and device dumping. All netlink communication happens within the AF_UNSPEC family. I initially introduced new netlink families for this, but removed them again since that would require adding new protocol families that serve no further purpose for most drivers. Additionally we currently use RTM.*LINK messages with ifi_family != AF_UNSPEC for information that is related to the device, but doesn't come from the driver that created the device itself, like bridge port state, IPv6 device configuration etc. The device specific attributes are nested within a new attribute IFLA_LINKINFO. I didn't use IFLA_PROTINFO since userspace can reasonably expect to have IFLA_PROTINFO unset for AF_UNSPEC messages, and the userspace STP daemon does that. Identification of the driver happens by name, stored in the IFLA_INFO_NAME attribute. IFLA_INFO_DATA contains driver specific attributes, IFLA_INFO_XSTATS driver specific statistics. The API does *not* use the existing RTM_SETLINK message type, instead it adds support for receiving RTM_NEWLINK within the kernel. I did this because of three reasons: - RTM_SETLINK does not follow the usual rtnetlink conventions and ignores all netlink flags - Other rtnetlink subsystems use the same message type for dumps and notifications from the kernel as for configuration from userspace, which usually allows to recreate an object by simply setting the NLM_F_REQUEST flag on message received from the kernel and sending it back. - Easier for userspace to detect support for the new features The RTM_NEWLINK message type is a superset of RTM_SETLINK, it allows to change both driver specific and generic attributes of the device. The set of generic device attributes that may be supplied during device creation is limited to a few simple ones, it currently does not support specifying link layer address/broadcast address as well as device flags. The change operation can change all device attributes. Not sure what else to say .. comments welcome. This excellent description of the APIs (particularly the background and reasoning) belongs in a file under Documentation/networking/ :-) - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC RTNETLINK 00/09]: Netlink link creation API
David Miller wrote: From: Patrick McHardy [EMAIL PROTECTED] Date: Tue, 5 Jun 2007 16:12:51 +0200 (MEST) A few words about the API: [..] Not sure what else to say .. comments welcome. This excellent description of the APIs (particularly the background and reasoning) belongs in a file under Documentation/networking/ :-) I'll add something like this under Documentation/, thanks. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC RTNETLINK 00/09]: Netlink link creation API
All patches looked really good. speaking for the ifb stuff, a definete ACK. The only thing that threw me off for a sec was the naming convention for type referenced via IFLA_INFO_NAME because it seems to be colliding semantic with dev-type and dev-name as in IFLA_NAME and ifi_type ifinfomsg. But i cant come with a better noun. Good stuff, nevertheless cheers, jamal - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC RTNETLINK 00/09]: Netlink link creation API
jamal wrote: All patches looked really good. speaking for the ifb stuff, a definete ACK. The only thing that threw me off for a sec was the naming convention for type referenced via IFLA_INFO_NAME because it seems to be colliding semantic with dev-type and dev-name as in IFLA_NAME and ifi_type ifinfomsg. But i cant come with a better noun. How about IFLA_INFO_KIND (borrowed from sch_api)? I generally don't like the IFLA_INFO_ prefix very much, but so far didn't come up with something better. Suggestions welcome :) - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC RTNETLINK 00/09]: Netlink link creation API
On Wed, 2007-06-06 at 00:07 +0200, Patrick McHardy wrote: How about IFLA_INFO_KIND (borrowed from sch_api)? I generally don't like the IFLA_INFO_ prefix very much, but so far didn't come up with something better. Suggestions welcome :) KIND sounds a lot more tasty ;- Thanks. cheers, jamal - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC RTNETLINK 00/09]: Netlink link creation API
Reading through the patches they look usable to me. Having to patch iproute to create the more interesting network devices sucks, but that problem seems fundamental. We might be able to avoid it if we allowed fields to be reused between different types of devices but that makes the error checking trickier, and we aren't likely to have that many types of devices so there likely isn't much value in generalizing. I do think we should specify the IFLA_KIND (was: IFLA_NAME) values in a header file. So it is easy to get a list of all of the different kinds and so we don't conflict. Eric - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html