Re: [PATCH 0/1] IPN: Inter Process Networking
On Mon, Dec 17, 2007 at 04:10:19AM -0800, [EMAIL PROTECTED] wrote: > if you are talking network connections between virtual systems, then the > exiting tap interfaces would seem to do everything you are looking for. you > can add them to bridges, route between them, filter traffic between them > (at whatever layer you want with netfilter), use multicast, etc as you > would any real interface. > > if, however, you are talking about non-network communications (your example > of sending raw video frames across the interface), and want multiple > processes to receive them, this sounds like exactly the thing that splice > was designed to do, distribute data to multiple recipiants simultaniously > and efficiantly. I'll try to explain. Our first interest was to be able to interconnect virtual, real, and partial virtual machines. We developed VDE for this, it's a user-level L2 switch. Specific as it may be, it's quite popular as a simple but flexible tool. It can interconnect UML, Qemu, UMView, slirp, everything that can be connected to a tap interface, etc. So, you say, it's a networking issue and we could live with tun/tap. There's a major point here: at present, dealing with tun/tap, bridges, routing is quite difficult if you are a *regular* user with *no* capabilites at all. You have tun/tap persistency and association to a specific user (or group, recently), at most. That's good - we don't want regular users to mess with global networking rules and settings. Think of a bunch of etherogeneous virtual machines, partial virtual machines (i.e. VMs where only a subset of system calls may be virtualized or not depending on the parameters - that's the case of View-OS) that must be interconnected and that may or may not have a connection to a real network interface (maybe via a tunnel towards a different machine). There's no need for administrator intervention here. Why should an user have to ask root to create lots of tap interfaces for him, bind them in a bridge and set up filtering/routing rules? What would the list of interfaces become when different users asked for the same thing at the same time? You could define a specific interconnecting bus, but we've already have it: ethernet. VDE comes in help as it allows regular users to build distributed ethernet networks. VDE works fine, but at present often results in a bottleneck because of the high number of user-processes involved and user-kernel-user switches needed in order to transfer a single ethernet frame. Moving the core inside the kernel would limit this problem and result in faster communication with still no need for root intervention or global namespace messing. (we're thinking if something can be done working with containers or similar structures, both for networking and partial virtualization, but that's another topic). So we started thinking how to use existing kernel structures, and we concluded that: - no existing kernel structures appeared to be optimal for this work; - if we've had to design a new structure, it would have been more useful if we tried to be as more general as we could. At present we're still focused on networking and other applications are just examples, but we thought that adding a general extensible multipoint IPC family is quite better than adding the most specific solution to our current problem. Maybe people with experience in other fields may tell us if there are other problems that can be resolved, or optimized, or simply made simpler, with IPN. Maybe our proposal is not the best as for interface and semantics. But we feel that it may fill an "empty space" in the available IPC mechanisms with a quite simple but powerful approach. > for a new family to be valuble, you need to show what it does that isn't > available in existing families. Is it "more acceptable" to add a new address family or to add features to existing ones? (my question is purely informative, I don't want to sound sarcastic or whatever) For instance, someone proposed "let's just add access control to the netlink family". It seems a though work. You proposed splice, other have proposed multicast or netlink. If I have understood correctly, splice helps in copying data to different destinations in a very fast way. But it needs a userspace program that receives data, iterates on fds and splices the data "out", calling a syscall for each destination. syscall calling may have become very fast but we still notice slowdowns due to the reasons I've explained before. --- (the following is not related to IPN but i wanted to answer this too) > I'm not familiar enough with ptrace vs utrace to know this argument. but I > haven't heard any of the virtualization people complaining about the > existing interfaces. They seem to have been happily useing them for a > number of years. ptrace has a number of drawbacks that have been partially addressed adding flags and parameters for "cheating" and obtaining better performances. It's *slow*
Re: [PATCH 0/1] IPN: Inter Process Networking
On Mon, Dec 17, 2007 at 03:31:48AM -0800, [EMAIL PROTECTED] wrote: > wouldn't it be better to just add the ability for multiple writers to send > to the same pipe, and then have all of them splice into the output of that > pipe? this would give the same data-agnostic communication that you are > looking for, and with the minor detail that software would have to filter > out messages that they send, would appear to meet all the goals you are > looking at, useing existing kernel features that are designed to be very > high performance. Being able to define both filtering policies (think of a virtual ethernet layer 2 switch, for instance. We have situations where dozens or hundreds of virtual cables are connected to the same switch, it would be much, much slower if you had to awake all the user processes for each single non-broadcast ethernet frame, and send them useless data) and delivery guarantees (lossless vs best-effort delivery) are not minor details in our opinion. We might have added a level2 virtual ethernet switch at kernel level, but it seemed to specific. With a minor effort we have split the "dumb" bus (IPN) and the ability to process specific structured data with specific policies (sub-modules as kvde_switch). We surely may adapt existing features (AF_UNIX, or pipes) but they offer a quite established interface and semantics and we think it should be better to add a new family. This would prevent from breaking what already exists and leaving more freedom in defining the new family according to needs. As for ptrace vs utrace: ptrace has been designed for debugging; trying to bend it to be fit for virtualization is likely to end up in an intricated interface and implementation. utrace has been designed in a much more general way. You can implement ptrace over utrace, but you can use utrace also for virtualization in a cleaner, simpler and more efficient way. Why not? Ludovico -- <[EMAIL PROTECTED]>#acheronte (irc.freenode.net) ICQ: 64483080 GPG ID: 07F89BB8 Jabber: [EMAIL PROTECTED] Yahoo: gardenghelle -- This is signature nr. 3556 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/1] IPN: Inter Process Networking
On Mon, Dec 17, 2007 at 03:31:48AM -0800, [EMAIL PROTECTED] wrote: wouldn't it be better to just add the ability for multiple writers to send to the same pipe, and then have all of them splice into the output of that pipe? this would give the same data-agnostic communication that you are looking for, and with the minor detail that software would have to filter out messages that they send, would appear to meet all the goals you are looking at, useing existing kernel features that are designed to be very high performance. Being able to define both filtering policies (think of a virtual ethernet layer 2 switch, for instance. We have situations where dozens or hundreds of virtual cables are connected to the same switch, it would be much, much slower if you had to awake all the user processes for each single non-broadcast ethernet frame, and send them useless data) and delivery guarantees (lossless vs best-effort delivery) are not minor details in our opinion. We might have added a level2 virtual ethernet switch at kernel level, but it seemed to specific. With a minor effort we have split the dumb bus (IPN) and the ability to process specific structured data with specific policies (sub-modules as kvde_switch). We surely may adapt existing features (AF_UNIX, or pipes) but they offer a quite established interface and semantics and we think it should be better to add a new family. This would prevent from breaking what already exists and leaving more freedom in defining the new family according to needs. As for ptrace vs utrace: ptrace has been designed for debugging; trying to bend it to be fit for virtualization is likely to end up in an intricated interface and implementation. utrace has been designed in a much more general way. You can implement ptrace over utrace, but you can use utrace also for virtualization in a cleaner, simpler and more efficient way. Why not? Ludovico -- [EMAIL PROTECTED]#acheronte (irc.freenode.net) ICQ: 64483080 GPG ID: 07F89BB8 Jabber: [EMAIL PROTECTED] Yahoo: gardenghelle -- This is signature nr. 3556 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/1] IPN: Inter Process Networking
On Mon, Dec 17, 2007 at 04:10:19AM -0800, [EMAIL PROTECTED] wrote: if you are talking network connections between virtual systems, then the exiting tap interfaces would seem to do everything you are looking for. you can add them to bridges, route between them, filter traffic between them (at whatever layer you want with netfilter), use multicast, etc as you would any real interface. if, however, you are talking about non-network communications (your example of sending raw video frames across the interface), and want multiple processes to receive them, this sounds like exactly the thing that splice was designed to do, distribute data to multiple recipiants simultaniously and efficiantly. I'll try to explain. Our first interest was to be able to interconnect virtual, real, and partial virtual machines. We developed VDE for this, it's a user-level L2 switch. Specific as it may be, it's quite popular as a simple but flexible tool. It can interconnect UML, Qemu, UMView, slirp, everything that can be connected to a tap interface, etc. So, you say, it's a networking issue and we could live with tun/tap. There's a major point here: at present, dealing with tun/tap, bridges, routing is quite difficult if you are a *regular* user with *no* capabilites at all. You have tun/tap persistency and association to a specific user (or group, recently), at most. That's good - we don't want regular users to mess with global networking rules and settings. Think of a bunch of etherogeneous virtual machines, partial virtual machines (i.e. VMs where only a subset of system calls may be virtualized or not depending on the parameters - that's the case of View-OS) that must be interconnected and that may or may not have a connection to a real network interface (maybe via a tunnel towards a different machine). There's no need for administrator intervention here. Why should an user have to ask root to create lots of tap interfaces for him, bind them in a bridge and set up filtering/routing rules? What would the list of interfaces become when different users asked for the same thing at the same time? You could define a specific interconnecting bus, but we've already have it: ethernet. VDE comes in help as it allows regular users to build distributed ethernet networks. VDE works fine, but at present often results in a bottleneck because of the high number of user-processes involved and user-kernel-user switches needed in order to transfer a single ethernet frame. Moving the core inside the kernel would limit this problem and result in faster communication with still no need for root intervention or global namespace messing. (we're thinking if something can be done working with containers or similar structures, both for networking and partial virtualization, but that's another topic). So we started thinking how to use existing kernel structures, and we concluded that: - no existing kernel structures appeared to be optimal for this work; - if we've had to design a new structure, it would have been more useful if we tried to be as more general as we could. At present we're still focused on networking and other applications are just examples, but we thought that adding a general extensible multipoint IPC family is quite better than adding the most specific solution to our current problem. Maybe people with experience in other fields may tell us if there are other problems that can be resolved, or optimized, or simply made simpler, with IPN. Maybe our proposal is not the best as for interface and semantics. But we feel that it may fill an empty space in the available IPC mechanisms with a quite simple but powerful approach. for a new family to be valuble, you need to show what it does that isn't available in existing families. Is it more acceptable to add a new address family or to add features to existing ones? (my question is purely informative, I don't want to sound sarcastic or whatever) For instance, someone proposed let's just add access control to the netlink family. It seems a though work. You proposed splice, other have proposed multicast or netlink. If I have understood correctly, splice helps in copying data to different destinations in a very fast way. But it needs a userspace program that receives data, iterates on fds and splices the data out, calling a syscall for each destination. syscall calling may have become very fast but we still notice slowdowns due to the reasons I've explained before. --- (the following is not related to IPN but i wanted to answer this too) I'm not familiar enough with ptrace vs utrace to know this argument. but I haven't heard any of the virtualization people complaining about the existing interfaces. They seem to have been happily useing them for a number of years. ptrace has a number of drawbacks that have been partially addressed adding flags and parameters for cheating and obtaining better performances. It's *slow* expecially if you want to