On Fri, 2006-04-28 at 12:21 -0700, David S. Miller wrote: > From: Rusty Russell <[EMAIL PROTECTED]> > Date: Fri, 28 Apr 2006 18:24:08 +1000 > > > Note that the problem space AFAICT includes strange advanced routing > > setups, ingress qos and possibly others, not just netfilter. But > > perhaps the same solutions apply, so I'll concentrate on nf. > > Yes, this hasn't been mentioned explicitly yet. > > The big problem is that we don't want the classifier to become > overly complex. > > One scheme I'm thinking about right now is an ordered lookup > that looks like: > > 1) Check for established sockets, they trump everything else. > > 2) Check for classifier rules, ie. netfilter and packet scheduler > stuff > > 3) Check for listening sockets > > 4) default channel > > #2 is still an unsolved problem, we don't want this big complex > classifier to be required in the hardware implementations. > However, using just IP addresses and ports does not map well to > what netfilter and co. want.
You're still thinking you can bypass classifiers for established sockets, but I really don't think you can. I think the simplest solution is to effectively remove from (or flag) the established & listening hashes anything which could be effected by classifiers, so those packets get send through the default channel. This can graduate from "all or nothing" to some more fine-grained scheme over time. I have some early thoughts on how we could really do this with filtering by connection tracking state; serious work, but feasible. > > Ah, this is a different problem. Our idea was to have a syscall which > > would check & sanitize the buffers for output. To do this, you need the > > ability to chain buffers (a simple next entry in the header, for us). > > > > Sanitization would copy the header into a global buffer (ie. not one > > reachable by userspace), check the flowid, and chain on the rest of the > > user buffer. After it had sanitized the buffers, it would activate the > > NIC, which would only send out buffers which started with a kernel > > buffer. > > > > Of course, the first step (CAP_NET_RAW-only) wouldn't need this. And, > > if the "sanitize_and_send" syscall were PF_VJCHAN's write(), then the > > contents of the write() could actually be the header: userspace would > > never deal with chained buffers. > > I am not sure any of this is anything more than overhead. > > If we just pop the buffers directly into the user mmap()'d ring > buffer, headers and all, and give an offset+length pair so the > user knows where the data starts and how much data is there, it > should all just work out. Where to put the offset+length is > just a detail. Agreed, but I was talking about userspace *send*, in reply to Caitlin bringing it up. A little off-topic, but I mentioned our thoughts simply to show that it's possible to do unpriv'ed output... (Kelly is taking a couple of well-earned days off ATM). Cheers! Rusty. -- ccontrol: http://ozlabs.org/~rusty/ccontrol - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html