Re: [Gtk-gnutella-devel]Re: Call for idea: GTKG-specific bootstrap protocol

Christian Biere Sun, 18 Sep 2005 14:54:09 -0700

Bill Pringlemeir wrote:
> On 18 Sep 2005, [EMAIL PROTECTED] wrote:
 
> > Don't misunderstand me: I really love HTTP, I know it well enough.
> > But it has an intrinsic overhead that I think is too large for GWCs.
 
> Ok, I didn't find too much on UHC.  However, it seems to be a lot
> better than GWC because regular nodes on the network are used afaiu?


In theory, yes. Gtk-Gnutella doesn't really do that. However that
alone doesn't really solve the bootstrap problem and I don't think
Gtk-Gnutella ever needs to fall back to a cache once you were
connected. However, if there's a condition that makes it impossible
for Gtk-Gnutella to connect to peers it will happily contact the
caches over and over again. That's why I bumped the UHC lock to
24 hours. Someone else reduced it to 10 minutes. In return I
increased back to an hour at least. You see negotiating reasonable
timeouts works like a bazar (just in case you thought it was
based on logic or something).

> So, I would propose a server that only handles "ip", "refer" and
> "vote".  I am using the "ip" like the GWC concept.  The mechanism
> could work like this,

> Client A connects to bootstrap server and supplies it's IP and port.
> A secret key (Sa) is sent to the client A.

That doesn't look like a good idea. The cache must connect to clients
to gather addresses, not vice-versa. Otherwise the cache will explode
sooner or later. In fact, updates currently outweigh other requests in
frequency at GWebCaches and that's really a problem with TCP.

Personally, I'd prefer if you could describe your scheme without
the encryption (as you say it's just a gimmick anyway). I'm not
sure whether I really understood your proposal. Also, passing
information about caches using client-to-client transfers should
be carefully reconsidered. If anyone implemented that to publish
GWebCache URLs the sky would almost certainly come down. Not a
single vendor did check and normalize URLs probably and most of
them *still* don't do this. I wouldn't be surprised at all if
some clients considered blah.example.org and BLAH.example.org
being different hosts and happily try both variants to "bootstrap".
In the GWebCache system, the caches work as filter - at least
some of them.

One problem I see with publishing and collecting UHCs using
hostnames is that someone could somewhen point dozens or hundreds
of them to a single IP address (or range of IP addresses) to
bring a host or network down.

The central issue however is (just as in kindergarten) discipline.
Few clients (if any) use the caches truly as bootstrap system. Most
of them fall back to them far too easily and do not restrict
themselves to an acceptable amount of requests. You can only fix
your own software, talking to other developers is (usually) as
satisfying as a discussion with a brick wall.

Anyway, your vote mechanism might be redundant. Gnutella peers
are constantly exchanging fresh peer addresses. This works inband
as well as outband. Peers can (and most do) indicate their average
daily uptime as well UHC support. So basically, once you're on
the net, the client must simply collect addresses of peers with
high uptimes (but also lower for more diversity). Then it should
ping them once a while (outband) to see whether they're still
online.

This way your local hostcache should almost always be usable even
after being offline for a week. However, I think this doesn't
really solve the "bootstrap" problem. And you cannot prevent that
peers bang them (due to sloppy coding or actually intentionally)
anyway.

I considered using "cheap" web servers that would simply serve a
hostcache file (just a list of peer addresses). This wouldn't
require any CGI or custom server software. The file itself would
be pushed by the intelligently designed caches periodically and
also contain a timestamp (as you can't trust last-modified) so
that clients don't use peer addresses from stale caches. This
way you could easily install hundreds of dumb caches. Of course
access permissions could be a problem if those are not your
own servers but rather servers of volunteers. Though there are
infinite ways to transfer files (smtp, http, https, ftp, sth custom)
and something like 10 minutes should be sufficient as frequency.

Now something which is also very important: No system will ever
scale if clients fall back to *all* caches they know about. Basically,
if you there are 10000 caches clients would bang 10000 caches instead
of 20. Ok, the time necessary to connect to all of them would buy
you some delay but in the end it still sucks.

Therefore, clients must really *give up* *xor* use sufficiently large
*exponential* delays when contacting caches. That may sound simple
but how do you teach a brick?

-- 
Christian

pgpsHYiQ8T52E.pgp
Description: PGP signature

Re: [Gtk-gnutella-devel]Re: Call for idea: GTKG-specific bootstrap protocol

Reply via email to