On Sun, Nov 16, 2008 at 09:24:19PM +0000, Eris Discordia wrote:
>> That isn't happening.  All we have is one TCP connection and one small
>> program exporting file service.
>
> I see. But then, is it the "small program exporting file service" that 
> does the multiplexing? I mean, if two machines import a gateway's /net 
> and both run HTTP servers binding to and listening on *:80 what takes 
> care of which packet belongs to which HTTP server?

I don't think you've quite got it yet.... also I swore I wouldn't post in
this thread.  Oh well, here goes.

First, let me draw a picture, just to introduce the characters:

+----------+              +---------+              
| Internal |<--Ethernet-->| Gateway |<--Ethernet-->(Internet)
| Computer |   ^          +---------+              
+----------+   |
               |
+----------+   |
|  Other   |<--+
| Internal |
| Computer |
+----------+

OK.  Here, we have two internal computers (IC and OIC) and a gateway G.
There are two Ethernet networks in flight, one connecting IC, OIC, and G,
and the other connecting G to the internet at large, somehow (e.g. ADSL).

IC and OIC both initialize somehow, say using DHCP from G, and bring their
network stacks up using this information.  Their kernels now provide the
services that will be mounted, by convention, at /net, again using this
private information.  G initializes statically, bringing its IP stack up
using two interfaces, with two routes: one for internal traffic, and one
default route.

So far, it's all very similar between Plan 9 and Linux.  Here's where
our story diverges.

In Linux, IC and OIC are both taught that they have default routes to the
Internet At Large by sending IP datagrams to G's internal ethernet
interface.  How this works in more detail, they don't care.  G will also
send them corresponding IP datagrams from its interface; that's all they
care about.  G works furiously to decode the network traffic bound for the
internet and NA(P)T it out to its gateway, maintaining very detailed tables
about TCP connections, UDP datagrams, etc, etc.  How could it be but any
other way?

Let's redraw the picture, as things stand now, under Plan 9, just so things
are clear:

                                                  (OIC similar to IC)
+----Internal Computer--------------------+               |
| bind '#I' /net                          |               |
| #I : Kernel IP stack (192.168.1.2)      |               |
| #l : Kernel ethernet driver             |<---Ethernet---+
+-----------------------------------------+               |
                                                          |
+----Gateway------------------------------+               |
| bind '#I' /net                          |               |
| #I : Kernel IP stack (192.168.1.1)      |               |
|                      (4.2.2.2)          |               |
| #l : Kernel ethernet driver  (ether0)   |<---Ethernet---+
|                              (ether1)   |<---Ethernet----->(Internet)
+-----------------------------------------+

In Plan 9, G behaves like any other machine, building a /net out of pieces
exported by its kernel, including the bits that know how to reach the
internet at large through the appropriate interface.  Good so far?

Let's have G run an exportfs, exposing its /net on the internal IP address.
This /net knows how to talk to the internal addresses and the external ones.

Meanwhile, IC can reach out and import G's /net, binding it at /net.alt,
let's say.  Now, programs can talk to the Internet by opening files in
/net.alt.  These open requests will be carried by IC's mount driver, and
then IC's network stack, to G, whereupon the exportfs (in G's userland) will
forward them to its idea of /net (by open()ing, read()ing, write()ing,
etc.), which is the one built on G's kernel, which knows how to reach the
Internet.  Tada!  Picture time:

                                                        (OIC)
+----Internal Computer-------------------------+          |
| abaco: open /net.alt/tcp/clone               |          |
|                                              |          |
| import tcp!192.168.1.1!9fs /net.alt (devmnt) |          |
| bind '#I' /net                               |          |
| #I : Kernel IP stack (192.168.1.2)           |          |
| #l : Kernel ethernet driver                  |<---------+
+----------------------------------------------+          |
                                                          |
+----Gateway------------------------------+               |
| exportfs -a -r /net                     |               |
|                                         |               |
| bind '#I' /net                          |               |
| #I : Kernel IP stack (192.168.1.1)      |               |
|                      (4.2.2.2)          |               |
| #l : Kernel ethernet driver  (ether0)   |<---Ethernet---+
|                              (ether1)   |<---Ethernet----->(Internet)
+-----------------------------------------+

This works perfectly for making connections: IC's IP stack is aware only of
devmnt requests, and G's IP stack is aware of some trafic to&from a normal
process called exportfs, and that that process happens to be making network
requests via #I bound at /net.

The beauty of this design is just how well it works, everywhere, for
everything you'd ever want.

Now, suppose IC goes to listen on TCP:80, by opening /net.alt/tcp/clone.
The same flow of events happen, and to a certain extent, G's network stack
thinks that the exportfs program (running on G) is listening on TCP:80.
exportfs dutifully copies the /net data back to its client.

Naturally, if another program on G were already listening on TCP:80, or the
same program (possibly exportfs) attempted to listen twice (if, say, OIC
played the same game and also tried to listen on G's TCP:80), it would be
told that the port was busy.  This error would be carried back along the
exportfs path just as any other.

So as you see, there is no need to take care of "which packet belongs to
which server" since there can, of course, be only one server listening on
TCP:80.  That server is running on G, and behaves like any other program.
That it just so happens to be an exportfs program, relaying data to and from
another computer, is immaterial.

This also works for FTP, and UDP, and ESP (which are notorious problems in
the NAT world), and for IL, and for IPv6, and for ethernet frames (!), and
... you get the idea.  It does this with no special tools, no complex code,
and a very minimal per-connection overhead (just the IC and G kernels and
G's exportfs tracking a file descriptor).

There are no connection tracking tables anywhere in this design.  There are
just normal IP requests over normal ethernet frames, and a few more TCP (or
IL) connections transporting 9P data.

> On a UNIX clone, or on Windows, because there is exactly one TCP/IP  
> protocol stack in usual setups no two programs can bind to the same port 
> at the same time. I thought Plan 9's approach eliminated that by keeping 
> a distinct instance of the stack for each imported /net.

There can, in fact, be multiple IP stacks on a Plan 9 box, bound to multiple
Ethernet cards, as you claim.  (In fact, one can import another computer's
ethernet cards and run snoopy, or build a network stack, using those
instead.)  I don't think that's relevant to the example at hand, though, as
should be clear from the above.

--nwf;

Attachment: pgpb9DBzI1kF6.pgp
Description: PGP signature

Reply via email to