Hi Matt, I am glad you are looking deeply into the issue of how to deploy vastly distributed AGI. I have some follow-up questions regarding your informative comments.
...but NAT hole punching is not really a solution. I have a home router/firewall/NAT that by default blocks all incoming traffic (TCP, UDP, ICMP) not initiated by me. I could configure the firewall to open a port for P2P traffic, but in order to get a peer widely deployed, there has to be an incentive for users to do so. I've read the n2n paper and downloaded their software, but not yet actually evaluated it. All my home computers are behind the same NAT/router/firewall (i.e. a Netgear Cable/DSL router), so I postponed having to collaborate with someone else who is also behind a NAT. My understanding of n2n is that, like you say, each end-user periodically contacts a public, redundant, super node, to maintain a lease. The client opens the TCP socket connection, and the super node accepts the connection request. This behavior should not be a problem with a firewall or NAT (Network Address Translation facility in the user's router), because the end user behind the NAT has the client role - not the server role. Then the super node can inform each end user's client of the other's NAT-provided IP address and port, so that the peers can directly exchange packets as though they were servers. The key point is that initially both of them opened a TCP socket as a client; they did not accept a connection as a server. Only so-called symmetric NATs would force the super node to relay every packet from peer to peer. There is some downloadable software that will determine whether your NAT is symmetric. Mine, like most are not. With the advent of Skype and other P2P, VOIP (Internet phone) services, users are purchasing NATs that are not symmetric, but rather support the NAT hole-punching Skype employs. Matt, given my clarification, would you then agree that you as an end user need no firewall or router configuration beyond the defaults to make use of n2n? Or could you please correct my understanding of the n2n paper here. I also looked at N2N. It creates a virtual IP layer so it should work for both TCP and UDP. However it still needs supernodes with public IP addresses to set up connections. It also appears that the software needs to run at both ends. I agree that public (i.e. static) IP addresses are needed for the super nodes, and I agree that n2n software is required for each peer user. As long as the super nodes are not relaying any TCP messages between peers (e.g. those behind symmetric NATs), then I believe that a few redundant super nodes can support a vast number of clients, in the same fashion as the Network Time Protocol is deployed. My plan is deploy Texai in a distributed fashion with minimal need on centralized infrastructure and also at the lowest possible cost. I think that I can afford a few dedicated servers at ServerPronto for 30 USD per month per server. Perhaps a couple of these could handle a million Texai clients at less than 2,000 lease-renewal connections per second apiece. I would distribute the n2n software as part of the Texai end-user download, as it has the GPL license. A setup wizard would inform the user if they had a symmetric NAT - that would require replacement before Texai could benefit them. Maybe for distributed AI you could get away with not using encryption, but you will still need some way to authenticate users. Otherwise your system is vulnerable to malicious users inserting bogus data or spam while impersonating another source with a high reputation. The n2n software fully encrypts TCP messages by providing a VPN (Virtual Private Network). So this should not be a problem right? Texai will have to provide a user authentication facilty anyway to attach credentials to what a user teaches it, and also to safeguard private information designated by the user. Thanks for the advice. -Steve Stephen L. Reed Artificial Intelligence Researcher http://texai.org/blog http://texai.org 3008 Oak Crest Ave. Austin, Texas, USA 78704 512.791.7860 ----- Original Message ---- From: Matt Mahoney <[EMAIL PROTECTED]> To: agi@v2.listbox.com Sent: Monday, May 5, 2008 1:43:20 PM Subject: Re: [agi] organising parallel processes --- Stephen Reed <[EMAIL PROTECTED]> wrote: > Matt (or anyone else), have you gotten as far as thinking about NAT > hole punching or some other solution for peer-to-peer? Yes, but NAT hole punching is not really a solution. I have a home router/firewall/NAT that by default blocks all incoming traffic (TCP, UDP, ICMP) not initiated by me. I could configure the firewall to open a port for P2P traffic, but in order to get a peer widely deployed, there has to be an incentive for users to do so. I looked at Skype. http://www1.cs.columbia.edu/~library/TR-repository/reports/reports-2004/cucs-039-04.pdf Their approach (as of 2004) to initiate a connection is to first try UDP, and if that is blocked by a firewall, then try TCP on a high port number, and then on ports 80 and 443 which aren't usually blocked. If none of these work (like trying to contact me), then both peers have to relay messages through a supernode with a public (not necessarily static) IP address that admits at least one of these ports. I would have to continually poll supernodes to see if I have any incoming calls. This is convenient because users can run Skype without reconfiguring their firewalls. However it adds traffic to supernodes. The Skype software detects if you have a firewall/NAT, and if not, your computer automatically becomes a supernode and there is no option (as of 2004) to turn this off. This creates an incentive to install a NAT, which is the opposite of what you want. I also looked at N2N. It creates a virtual IP layer so it should work for both TCP and UDP. However it still needs supernodes with public IP addresses to set up connections. It also appears that the software needs to run at both ends. A second problem is that Skype is not truly distributed. There is a central login server. Skype traffic is encrypted with AES using public key RSA to negotiate keys. However, the login server has to validate the public keys. The problem is fundamental to public key cryptography. I can publish a public key so that anyone can send me encrypted messages that only I can read. However, this does not stop people from saying "I'm Matt and here is my public key" to read messages intended for me. The Skype login server prevents this impersonation. SSH has this problem too. To set up a secure server, I register my public key with a "trusted authority" which can vouch that I am really me by signing my key. Another authority has to vouch for it, ultimately leading to a chain of certificate authorities leading back to a root authority like Verisign or Microsoft. Maybe for distributed AI you could get away with not using encryption, but you will still need some way to authenticate users. Otherwise your system is vulnerable to malicious users inserting bogus data or spam while impersonating another source with a high reputation. -- Matt Mahoney, [EMAIL PROTECTED] ------------------------------------------- agi Archives: http://www.listbox.com/member/archive/303/=now RSS Feed: http://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: http://www.listbox.com/member/?& Powered by Listbox: http://www.listbox.com ____________________________________________________________________________________ Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ ------------------------------------------- agi Archives: http://www.listbox.com/member/archive/303/=now RSS Feed: http://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: http://www.listbox.com/member/?member_id=8660244&id_secret=101455710-f059c4 Powered by Listbox: http://www.listbox.com