Re: Can P2P applications learn to play fair on networks?

2007-10-22 Thread Perry Lorier





 Will P2P applications really never learn to play nicely on the network?



So from an operations perspective, how should P2P protocols be designed?

There appears that the current solution at the moment is for ISP's to 
put up barriers to P2P usage (like comcasts spoof'd RSTs), and thus P2P 
clients are trying harder and harder to hide to work around these barriers.


Would having a way to proxy p2p downloads via an ISP proxy be used by 
ISPs and not abused as an additional way to shutdown and limit p2p 
usage?  If so how would clients discover these proxies or should they be 
manually configured?


Would stronger topological sharing be beneficial?  If so, how do you 
suggest end users software get access to the information required to 
make these decisions in an informed manner?  Should p2p clients be 
participating in some kind of weird IGP?  Should they participate in 
BGP?  How can the p2p software understand your TE decisions?  At the 
moment p2p clients upload to a limited number of people, every so often 
they discard the slowest person and choose someone else.   This in 
theory means that they avoid slow/congested paths for faster ones. 
Another easy metric they can probably get at is RTT, is RTT a good 
metric of where operators want traffic to flow?  p2p clients can also 
perhaps do similarity matches based on the remote IP and try and choose 
people with similar IPs, presumably that isn't going to work well for 
many people, would it be enough to help significantly?  What else should 
clients be using as metrics for selecting their peers that works in an 
ISP friendly manner?


If p2p clients started using multicast to stream pieces out to peers, 
would ISP's make sure that multicast worked (at least within their 
AS?).  Would this save enough bandwidth for ISP's to care?  Can enough 
ISP's make use of multicast or would it end up with them hauling the 
same data multiple times across their network anyway?  Are there any 
other obvious ways of getting the bits to the user without them passing 
needlessly across the ISP's network several times (often in alternating 
directions)?


Should p2p clients set ToS/DSCP/whatever-they're-called-this-week-bits 
to state that this is bulk transfers?   Would ISP's use these sensibly 
or will they just use these hints to add additional barriers into the 
network?


Should p2p clients avoid TCP entirely because of it 's fairness between 
flows and try and implement their own congestion control algorithms on 
top of UDP that attempt to treat all p2p connections as one single 
congestion entity?  What happens if this is buggy on the first 
implementation?


Should p2p clients be attempting to mark all their packets as coming 
from a single application so that ISP's can QoS them as one single 
entity (eg by setting the IPv6 flowid to the same value for all p2p 
flows)? 

What incentive can the ISP provide the end user doing this to keep them 
from just turning these features off and going back to the current way 
things are done?


Software is easy to fix, and thanks to automatic updates of much p2p 
network can see a global improvement very quickly.


So what other ideas do operations people have for how these things could 
be fixed from the p2p software point of view? 



Re: Access to the IPv4 net for IPv6-only systems, was: Re: WG Action: Conclusion of IP Version 6 (ipv6)

2007-10-02 Thread Perry Lorier

 What has happened?  Well, application protocols have evolved to 
 accommodate NAT weirdness (e.g., SIP NAT discovery), and NATs have
 undergone incremental improvements, and almost no end-users care about
 NATs.  As long as they can use the Google, BitTorrent and Skype, most
 moms and dads neither know nor care about any technical impediments
 NATs erect between them and their enjoyment of the Internet.

Except every service that used to work using direct TCP connections has
either moved to UDP, or moved towards having unNATted boxes that people
can relay through.

While NAT traversal for TCP is theoretically possible, it relies on
rarely used features of TCP (Simultaneous open) and good timing, both of
which are likely to cause issues.  I've never heard of a successful real
world application successfully doing this. (Feel free to educate me if
you know of a realworld application in common use that does do TCP NAT
traversal and has it work a significant amount of the time).

Even p2p apps like bittorrent rely on the fact that there are /some/
people /somewhere/ in the swarm that have either configured their NAT to
allow pinholing or don't have any NAT between them and the Internet.
Plastered everywhere over anything P2P filetransfer related is poor
performance?  Add a pinhole to your NAT box! suggesting quite strongly
that NAT is causing large problems for P2P swarms.

NAT is hurting applications today, and applications aren't getting
deployed (or even written) because of problems NAT causes.


Re: Extreme congestion (was Re: inter-domain link recovery)

2007-08-18 Thread Perry Lorier





We've been pitching the idea to bittorrent tracker authors to include 
a BGP feed and prioritize peers that are in the same ASN as the user 
himself, but they're having performance problems already so they're 
not so keen on adding complexity. If it could be solved better at the 
client level that might help, but the end user who pays flat rate has 
little incentive to help the ISP in this case.




Many networking stacks have a TCP_INFO ioctl that can be used to query 
for more accurate statistics on how the TCP connection is fairing 
(number of retransmits, TCP's current estimate of the RTT (and jitter), 
etc).  I've always pondered if bittorrent clients made use of this to 
better choose which connections to prefer and which ones to avoid.  I'm 
unfortunately unsure if windows has anything similar.


One problem with having clients only getting told about clients that are 
near to them is that the network starts forming cliques.  Each clique 
works as a separate network and you can end up with silly things like 
one clique being full of seeders, and another clique not even having any 
seeders at all.  Obviously this means that a tracker has to send a 
handful of addresses of clients outside the clique network that the 
current client belongs to.


You want to make hosts talk to people that are close to you, you want to 
make sure that hosts don't form cliques, and you want something that a 
tracker can very quickly figure out from information that is easily 
available to people who run trackers.  My thought here was to sort all 
the IP addresses, and send the next 'n' IP addresses after the client IP 
as well as some random ones.  If we assume that IP's are generally 
allocated in contiguous groups then this means that clients should be 
generally at least told about people nearby, and hopefully that these 
hosts aren't too far apart (at least likely to be within a LIR or RIR).  
This should be able to be done in O(log n) which should be fairly efficient.


Re: ICANN registrar supporting v6 glue?

2007-06-29 Thread Perry Lorier




One note here is that even though you can get glue into com/net/org
using this method, there is no IPv6 glue for the root yet, as such even
if you manage to get the IPv6 glue in, it won't accomplish much (except
sending all IPv6 capable resolvers over IPv6 transport :) as all
resolvers will still require IPv4 to reach the root. One can of course
create their own root hint zone and force bind, or other dns server, to
not fetch the hints from the real root, but that doesn't help for the
rest of the planet. (Root alternatives like orsn could fix that up but
apparently their main german box that was doing IPv6 is out of the air)
  


Having  glue in GTLD/ccTLD's will help resolvers that first query 
for  glue before A glue for nameservers.  If you don't have  
glue then it's going to be an extra RTT to look up the A record for your 
nameservers, which makes your webpages slower to load.  And everyone 
wants their webpages to load faster.


The fact that the root name serers don't supply  glue for 
GTLDs/ccTLDs is a minor annoyance, people should in general only go to 
the root name servers once a day per GTLD/ccTLD.  There are 267 TLD's 
and you're unlikely to talk to them all in a given day, but almost every 
request your name server makes is going to start with a query to a GTLD 
or ccTLD server.




Re: Security gain from NAT (was: Re: Cool IPv6 Stuff)

2007-06-05 Thread Perry Lorier




The only ways into these machines would be if the NAT/PAT device were
misconfigured, another machine on the secure network were compromised, or
another gateway into the secure network was set up. Guess what? All of these
things would defeat a stateful inspection firewall as well.
  
I disagree.  (All of the below is hypothetical, I haven't tested it, but 
I believe it to be true.)


Premise 1: The machines behind the firewall are actually on and 
functioning, and presumably may be even being used.


Premise 2: The OS's on the machines will periodically do *some* kind of 
traffic.  Some common examples might be ntp syncronisation, or DNS 
resolving of an update service for antivirus, OS patches, whatever.  The 
traffic may be provided by the user actually using the machine for 
whatever real users actually do.


Premise 3: Many NAPT's are of the Cone type.  This is desirable for 
end users as it allows their applications/devices to use their NAPT 
busting technologys (STUN, Teredo etc) without having to configure 
static port forwards.


Premise 4: The external port chosen for an outgoing protocol is easily 
guessed.  Many NAPT boxes will prefer to use the same port as the 
original host, or will assign port mappings sequentially a bit of 
research here would go a long way, presumably entire networks are likely 
to be using the same NAPT's in an ISP's provided CPE.


Thus, for example if you are running a single host behind a NAPT box 
that is doing regular NTP queries and I can guess the external port on 
the NAPT box which with a bit of research I suspect is trivial, I can 
send that port on your external IP a packet and it will be forwarded 
back to your machine.  This could easily lead to a compromise via a 
buffer overflow or other exploit.


This would primarily work for UDP based services that by design tend to 
be used over the Internet itself such as DNS, NTP, SIP etc.  It seems 
unlikely that this would work against TCP based services.  Exploits in 
ICMP could also be tunneled back through a NAPT box in a similar 
manner.  GRE/IPIP/IPv6/ESP/AH can probably use similar techniques to 
infect machines behind a NAPT box (Disclaimer I don't know those 
protocols very well, but on the flipside, I suspect that NAPT boxes 
don't know them very well either and do dumb things with them like 
forward all GRE packets to the one host inside your network that has 
ever spoken GRE).


Just because you've never seen someone exploit through a NAPT box 
doesn't mean it won't happen. 





Re: DHCPv6, was: Re: IPv6 Finally gets off the ground

2007-04-15 Thread Perry Lorier




When you can plug your computer in, and automatically (with no
clicking) get an IPv6 address, 


Router Advertisements let you automatically configure as many IPv6 
addresses as you feel like.


 have something tell you where your DNS assist servers,

Microsoft had an old expired draft with some default anycast IPv6 
nameserver addresses:


   fec0:0:0:::1
   fec0:0:0:::2
   fec0:0:0:::3

-- http://tools.ietf.org/id/draft-ietf-ipv6-dns-discovery-04.txt

While this was never accepted by the IETF, I believe windows machines 
still use these by default if they have no other name servers but do 
have IPv6 connectivity.


This could be a fairly simple defacto standard if network operators 
start using it.  This is an obvious weak link in the chain at this point 
tho.


 configure web proxies,

once you have DNS you can use the WPAD proxy auto discovery thingamabob.

and solve your dynamic dns problems (as IPv4 set top boxes do today), 


Updating your forward/reverse dns via DNS Update messages isn't that 
uncommon today.


See:
http://www.caida.org/publications/presentations/ietf0112/dns.damage.html

where hosts are trying to update the root zone with their new names.

So you can get from A to D without requiring DHCPv6.


Re: Thoughts on increasing MTUs on the internet

2007-04-12 Thread Perry Lorier


Iljitsch van Beijnum wrote:


Dear NANOGers,

It irks me that today, the effective MTU of the internet is 1500 bytes, 
while more and more equipment can handle bigger packets.


What do you guys think about a mechanism that allows hosts and routers 
on a subnet to automatically discover the MTU they can use towards other 
systems on the same subnet, so that:


1. It's no longer necessary to limit the subnet MTU to that of the least 
capable system


2. It's no longer necessary to manage 1500 byte+ MTUs manually

Any additional issues that such a mechanism would have to address?


I have a half completed, prototype mtud that runs under Linux.  It 
sets the interface to 9k, but sets the route for the subnet down to 
1500.  It then watches the arp table for new arp entries.  As a new MAC 
is added, it sends a 9k UDP datagram to that host and listens for an 
ICMP port unreachable reply (like traceroute does).  If the error 
arrives, it assumes that host can receive packets that large, and adds a 
host route with the larger MTU to that host.  It steps up the mtu's from 
1500 to 16k trying to rapidly increase the MTU without having to wait 
for annoying timeouts.  If anything goes wrong somewhere along the way, 
(a host is firewalled or whatever) then it won't receive the ICMP reply, 
and won't raise the MTU.


The idea is that you can run this on routers/servers on a network that 
has 9k mtu's but not all the hosts are assured to be 9k capable, and it 
will increase correctly detect the available MTU between servers, or 
routers, but still be able to correctly talk to machines that are still 
stuck with 1500 byte mtu's etc.


In other interesting data points in this field, for some reason a while 
ago we had reason to do some throughput tests under Linux with varying 
the MTU using e1000's and ended up with this pretty graph:


http://wand.net.nz/~perry/mtu.png

we never had the time to investigate exactly what was going on, but 
interestingly at 8k MTU's (which is presumably what NFS would use), 
performance is exceptionally poor compared to 9k and 1500 byte MTU's. 
Our (untested) hypothesis is that the Linux kernel driver isn't smart 
about how it allocates it's buffers.





Re: wifi for 600, alex

2007-01-23 Thread Perry Lorier




An observation I would make is that the number of mac addresses per
person at the tech heavy meeting has climbed substantially over 1 (not
to 2 yet) so it's not so much that everyone brings a laptop... it's that
everyone brings a laptop, a pda and a phone, or two laptops. In a year
or two we'll be engineering around 2 radio's per person in five years
who knows.


We did the wireless network at LCA '06.  Due to abuse at LCA '05 we 
required everyone to register their mac address to their registration 
code before we let them onto the network.  This means we have a nice 
database of MAC's - people.


We saw:
199 people with 1 MAC address registered
102 people with 2 MAC addresses registered
9   people with 3 MAC addresses registered
5   people with 4 MAC addresses registered

1   person with 6 mac addresses registered

We did have a lot of problems with devices that didn't have a web 
browser (so had to ask us to add their macs manually, there were 11 
people who had this that aren't accounted above).  Mostly voip phones, 
but it's amazing how many people have random bits of hardware that will 
do wifi!


This is perhaps biased as there was also wired ethernet available to 
some people in their rooms (about 50 rooms IIRC), so some of those 102 
people would have a MAC for their wireless and a seperate MAC for their 
wired access.


We also ran soft AP's on soekris boxes running Linux, so we could hook 
into the AP at a fairly low level.  We firewalled all DHCP replies 
inside the AP so it wouldn't forward any DHCP replies received from the 
wireless to another client on the AP or onto the physical L2[1]


As an experiment we firewalled *all* arp inside the AP's so ARP spoofing 
was impossible.  ARP queries were snooped and an omapi query was sent to 
the DHCP server asking who owned the lease, and an ARP reply was unicast 
back to the original requester[2].  This reduced the amount of 
multicast/broadcast (which wireless sends at basic rate) on the network, 
as well as preventing people from stealing IPs and ARP spoofing.


To stop people from spoofing someone elses MAC, we also had lists of 
which AP a MAC was associated with, if a MAC was associated with more 
than one AP we could easily blacklist it and visit people in the area 
with a baseball bat.


We didn't see much abuse, (and didn't have people complain about abuse 
so I guess it's not just that they hid it from us), I think mostly 
because people knew that we had IP-MAC-name mappings, and abusers 
knew they could easily be tracked down.


One of the more interesting things was that during the daytime we were a 
net importer of traffic as people did their usual web surfing, but at 
about 10pm at night we suddenly became a net exporter as people started 
uploading all their photos to flikr.



[1]: All client to client traffic in managed mode is relayed via the AP.

[2]: Amusing story, one of the developers had written a patch to detect 
if someone else was using the same IP on the same L2 and produce a 
warning.  He tried it on our network and found that it didn't work. 
After much head scratching he discovered what we were doing :)


Re: Network end users to pull down 2 gigabytes a day, continuously?

2007-01-21 Thread Perry Lorier




Good thinking. Where do I sign? Regarding your first point, it's really
surprising that existing P2P applications don't include topology awareness.
After all, the underlying TCP already has mechanisms to perceive the
relative nearness of a network entity - counting hops or round-trip 
latency.

Imagine a BT-like client that searches for available torrents, and records
the round-trip time to each host it contacts. These it places in a lookup
table and picks the fastest responders to initiate the data transfer. Those
are likely to be the closest, if not in distance then topologically, and 
the

ones with the most bandwidth. Further, imagine that it caches the search -
so when you next seek a file, it checks for it first on the hosts 
nearest to

it in its routing table, stepping down progressively if it's not there.
It's a form of local-pref.


When I investigated bit torrent clients a couple of years ago, the 
tracker would only send you a small subset of it's peers at random, so 
as a client you often weren't told about the peer that was right beside 
you.  Trackers could in theory send you peers that were close to you (eg 
send you anyone thats in the same /24, a few from the same /16, a few 
more from the same /8 and a handful from other places.  But the tracker 
has no idea which areas you get good speeds to, and generally wants to 
be as simple as possible.


Also in most unixes you can query the tcp stack to ask for it's current 
estimate of the rtt on a TCP connection with:


#include sys/types.h
#include sys/socket.h
#include netinet/tcp.h
#include stdio.h

int fd;
struct tcp_info tcpinfo;
socklen_t len = sizeof(tcpinfo);

if (getsockopt(fd,SOL_TCP,TCP_INFO,tcpinfo,len)!=-1) {
  printf(estimated rtt: %.04f (seconds), tcpinfo.tcpi_rtt/100.0);
}

Due to rate limiting you can often find you'll get very similar 
performance to a reasonably large subset of your peers, so using tcp's 
rtt estimate as a tie breaker might provide a reasonable cost savings to 
the ISP (although the end user probably won't notice the difference)




Re: 6to4 gateways

2005-10-17 Thread Perry Lorier


 
 This seems like a problem that could be solved in the
 style of the CIDR report. Regular weekly reports of 
 v6 relays and locations as seen from various major ASes.

From my tr website I can see a few 6to4 gateways:
http://tr.meta.net.nz/output/2005-10-17_22:41_192.88.99.1.png
(beware, the image is extremely large, and can kill some browsers on
lower end machines).

Most of my source nodes are in NZ unfortunately which limits the number
of relays seen.