Re: [p2p-hackers] TCP Keepalive timeouts

Emin Gün Sirer Thu, 26 Mar 2009 19:50:44 -0700

Hi Russ,

Uptime patterns are fairly interesting. We looked at uptime datasets
from web servers, desktop PCs at Microsoft, and Planetlab nodes, and
Saroiu, Gummadi and Gribble looked closely at the lifetime of Gnutella
and Napster participants in their paper titled "Measuring and Analyzing
the Characteristics of Napster and Gnutella Hosts." Some quick
observations:

On average, past uptime performance is predictive of future uptimes.
"Random" events that take down a node (e.g. software updates, power
outages, someone bumping into the power button) happen to any given node
with approximately the same probability over time. This is great because
it implies that outages are not actually truly random events, and opens
up the use of history to predict future lifetimes.

Some classes of nodes are best characterized as bimodal -- they are
either in a phase where they experience long periods in between reboots
(presumably in production/usage mode), and phases where they are
rebooted very frequently (presumably experiencing some kind of distress,
e.g. hardware problems, someone tweaking software configs, etc). 

Saroiu et al studied Gnutella & Napster host lifetimes in detail at the
IP level. The good news is that P2P participants seem to be similar
regardless of protocol. It unfortunately does not seem possible from
their graphs to check if Petar's observation holds. And, if it holds,
taking advantage of this observation would require knowing not only that
the probability drops with age, but precisely how it drops
mathematically.

But in any case, exponential backoff is a terrible way to ping nodes. It
backs off so aggressively that moderately long-lived nodes will not be
well served. Short-lived nodes will be fine, they'll get pinged
frequently. Really long-lived nodes will be ok; sure, the first bunch of
pings are wasted, but after a relatively quick flurry, these nodes will
not be pinged much at all, which is fine for a node like my mail server
that goes down every few years. It's the nodes in the middle of the
lifetime distribution that will not be pinged frequently enough -- their
failures will be missed by the failure detector and lead to pesky
app-level timeouts.

It turns out that you want to ping nodes in proportion to the square
root of their expected lifetimes. Hence, sqrt(s) for the name of the
detector and the project. The details are in our paper:

        Latency- and Bandwidth-Minimizing Optimal Failure Detectors.
        Kelvin So and Emin Gün Sirer.
        In Proceedings of the European Conference on Computer Systems,
        Lisbon, Portugal, March 2007.
        http://www.cs.cornell.edu/People/egs/sqrt-s/papers.php

By the way, many other interesting apps in other domains are similar to
the (rather ho-hum) failure detection problem. E.g. detecting when web
pages/blogs/profiles are modified so they can be indexed, sensing events
in a sensor network with limited battery capacity, etc.

Gün.

On Thu, 2009-03-26 at 15:25 -0700, Russ Weeks wrote:
> 2009/3/26 Emin Gün Sirer <e...@systems.cs.cornell.edu>
>         This discussion touches on a related & elegant problem:
>         
>                Suppose a peer is tasked with monitoring the liveness
>         of N other
>                peers by pinging them periodically. What's the optimal
>         strategy
>                it should use for pinging those N peers?
>         
> 
> That's a really interesting question... Petar Maymounkov, in his
> description of Kademlia, shows that in the Gnutella network, the
> longer a node has been alive, the more likely it is to stay alive.
> Does this suggest that the liveness "pings" should follow an
> exponential backoff as well? (in addition to the NAT traversal pings,
> I mean)  Or is this behaviour likely specific to Gnutella and is
> unlikely to be found in other P2P systems?
> 
> Russ
> _______________________________________________
> p2p-hackers mailing list
> p2p-hackers@lists.zooko.com
> http://lists.zooko.com/mailman/listinfo/p2p-hackers

_______________________________________________
p2p-hackers mailing list
p2p-hackers@lists.zooko.com
http://lists.zooko.com/mailman/listinfo/p2p-hackers

Re: [p2p-hackers] TCP Keepalive timeouts

Reply via email to