On Apr 8, 2008, at 11:59 PM, Adam Richards wrote:

You're looking at creating 1:1 mappings from internal IPs to 150-500k public IPs.

No. Sorry, I should've been clearer: 1:1 mappings between, say, a / 18 worth of public IP space to something like a /13 worth of possible private IP space.

Okay.

It's not 1:1 over the entire mathematical set, obviously, but rather 1:1 between (public, private) IP pairs for an arbitrarily defined duration. Maybe a requirement is that I'll never have more concurrent connections than I have public IPs?

Yes. Given a /18 as your smallest address pool, you can have a max of ~~16k 1:1 mappings.

Maybe this will change in the future to where I may need 1:n mapping?

Note that 1:N requires flow tracking, even if it's only as granular as remote:internal/remote:external address pairs.

However it is clear that in stateful NATing with your typical non- TE'd network, return traffic will always come back through the same NAT device irrespective of the internal server's network locality to the Internet. That is, the internal server (or VIP) may live closer, topologically, to "Internet Feed B", whereas the NAT device may live closer to "Internet Feed A". Thus depending on where internal services live, standard stateful NATing may inhibit return traffic from egressing the *nearest* exit. This is the opposite of what I want!

But it's a requirement inherent in NAT itself; the traffic must go through the same mapping in both directions. It's not a matter of stateful vs stateless, it's a matter of where you do NAT. (I think we're on the same page but differing in use of "state", see below.)

* You want to distribute NAT by implementing it on or near each border. In order for anything bidirectional to function, the mappings must be consistent

Yup.

so that implies synchronizing state between them.

Well, it depends on what we mean by "synchronizing state".

I'm using state in a general sense here, not in pf-specific terms.

Static NAT is the kind of thing you'd set up on a simple rule-based engine, like a couple "binat" lines to map one block to another of the same size. I'd call this stateless.

Dynamic NAT is what you get when you've got more addresses on one side than the other, and need to apply policy decisions to allocate and remove mappings. Since you have to keep track of the NAT mappings in use, I consider this stateful.

Besides internal-external address mappings, pf's state engine also tracks remote address, protocol, TCP/UDP port numbers, and TCP connection state and sequence numbers. The TCP connection state and sequence checks are firewall operations, but the rest apply to NAT. This is just a matter of how much state is tracked, and determines how large N can be for 1:N scenarios.

I want to have the same mapping information-base exist everywhere, but NAT state (ie, flow state matching) is NOT required nor needed. Just a simple unidirectional L3 translation and then move on our merry way. I say unidirectional to mean that the NAT device doesn't drop a packet if it results in a flow-matching miss. This is just the old style "dumb nating" from yesteryear... as a matter of fact, the same NAT which used to exist on the linux side of the house (and incidentally is being re-introduced; again, another part of the thread mentioned above <http://kerneltrap.org/mailarchive/linux-netdev/2007/9/27/323772 >)

As Ryan mentioned, pf isn't equipped for this. pf is designed as a firewall first, and happens to have NAT abilities as an easy bonus. As such, the rule-driven part of it expects to have a relatively small ruleset compared to the number of states to be tracked. The ruleset requires a linear scan and simply won't scale to very large numbers of rules, but does support partial matching, while the state engine is built to handle large numbers of exact matches. (That tends to be true of a lot of systems, including the iptables engine, hence the patch above to use a completely different method of specifying mappings. iptables can do it, it just can't scale.)

In order to get pf doing what you want, you'll essentially have to cripple the state engine to first remove the firewalling logic, then scale back the amount of state tracked until it's as dumb as you want. Then add an interface to add entries to the state table itself, instead of using the ruleset. Or build your own engine based off of pf's state organization.

Honestly, if that netfilter patch does what you want, I'd seriously consider using it. It'd be far easier than ripping apart pf. (It won't do 1:N, but I suspect you'll need a different approach for that anyway.)


I would like preface my inline replies with stating what my original goal was, and still is, in starting this thread:

        I want to pursuade the pf community that stateless NAT is a
        desired feature and should be part of the core code.  :)  My
        use-case is just one example of how this can be a useful
        feature.

You haven't actually given a general use case for dumb NAT:

You talk about a high rate of mapping changes, so clearly you're managing entries dynamically, which is just another way of keeping state.

Keeping state, yes, but not by the NAT layer. Perhaps by an external system that correlates a lot more information than just IPs? Many pieces tied together to enable highly resillient services across a shared substrate?

Regarding the rate of mapping changes, yes, I need to be able to sustain a high rate of changes per second across the *whole* cluster of NAT devices without a significant forwarding performance hit.

Those who need static NAT would set it up in blocks, without trying to define each individual one of thousands of mappings. Those who need dynamic NAT would put NAT as close to a central point as possible, since it is stateful. Those who need redundancy would split the point in two and use something like CARP for failover. Those who have sufficient traffic that engineering a central point is impractical usually don't need NAT.

Those who need to explicitly control thousands of mappings are already extremely rare. I could see it for something like Amazon's EC2 clouds, where they're using virtualization such that they may need to transport a VM to another internal physical machine and must update the external IP mapping appropriately. (I suspect they're just leasing the IP to the internal machine directly, though. I don't know for sure.)

But you've previously said:

[...] perhaps 10's of operations per second on a table of 500,000 entries/mappings


If we assume a change rate of 10 per second for 16k mappings, a single external IP is going to migrate to another internal machine every 27 minutes. That's too fast for typical virtualization scenarios (just transporting the VM is usually too much overhead). You clearly can't have long-lasting TCP connections, and if you were using TCP for anything longer than a few seconds you wouldn't want to update a mapping that often just because it would break any active connections at the time of the move. Those arguments also rule out most forms of load balancing, plus you usually want load balancing to direct mappings at a much faster rate.

An external system managing the mappings can't snoop traffic to keep up very well (by the time the mapping is updated, the first packet will already have been lost or misdirected). That rules it out for flow tracking in particular (so no 1:N), and if it was in a position to observe all the traffic it could just be doing NAT in-line anyway. If you have sufficient control over the internal machines to have them signal the mapping system before doing anything, then the internal machines could do NAT themselves or lease an external IP directly.

So whatever policy decisions you're using to drive this don't fit any general use scenarios I can think of. Combine that with wanting to use an external managing system at all, and the apparent need for distributed NAT systems, and you're doing something way off the beaten path. It's hard to see how any tool would fit that use case out of the box, even if pf did support dumb NAT already.

Sorry for the lengthy missive. I'm err'ing on the side of verbosity. :)


No worries, it helps in conversations like this :)

Reply via email to