Re: [Gtk-gnutella-devel]Re: Call for idea: GTKG-specific bootstrap protocol

Christian Biere Mon, 19 Sep 2005 00:33:13 -0700

Raphael Manfredi wrote:
> Quoting Christian Biere <[EMAIL PROTECTED]> from ml.softs.gtk-gnutella.devel:
> :In theory, yes. Gtk-Gnutella doesn't really do that. However that
> :alone doesn't really solve the bootstrap problem and I don't think
> :Gtk-Gnutella ever needs to fall back to a cache once you were
> :connected. However, if there's a condition that makes it impossible
> :for Gtk-Gnutella to connect to peers it will happily contact the
> :caches over and over again. That's why I bumped the UHC lock to
> :24 hours. Someone else reduced it to 10 minutes.
> 
> Hmm... "Someone else"?
> 
> The problem with UHCs are when you're UDP-firewalled.  Currently, we don't
> use the lack of replies from UHCs as an indication that we are
> UDP-firewalled.  Because some of the UHCs in our hardwired list could be
> down, that could wrong.  However, after say 10 unsuccessful attempts at
> contacting an UHC, GTKG should consider it is definitively UDP-firewalled
> and therefore no longer attempt to contact UHC and fallback to GWCs.


It would also be better if the user had a chance to figure out what
the hell is going on before GTKG runs away contacting caches and
tons of peers before you had the chance to have at least a half-decent
configuration. Even if that would work for 90% of all people the
other 10% can be a serious issue with respect to cache resources.

> The only downside to this is that we don't know that we "don't know yet".
> When GTKG starts up for the first time, it has to assume it is completely
> firewalled, yet we cannot let it conntact the GWCs immediately: we prefer
> the lower-cost UHCs at that point.
> 
> :In return I increased back to an hour at least. You see negotiating
> :reasonable timeouts works like a bazar (just in case you thought it was
> :based on logic or something).
> 
> Timeouts are black magic.  Only experimentation can determine whether they
> are appropriate.

I definitely checked the worst case scenario and decided that the chosen
timeout were far too low. Too low even to fix any (firewall) problems
whilst it's trying to get connect. So if there's a real problem, all
those attempts are wasted anyway.

> 24 hours was far too large, 10 minutes was probably
> too low.  Yet, as you said, GTKG will only contact a cache when its
> internal cache is empty.  So we know that it will not abuse caches when
> things work out fine.  It's when things do not proceed as planned (e.g. the
> UHC pongs get lost or are blocked by a firewall) that things start to
> be interesting...

Well it also contacts caches when the number of cached peers goes
below the number of missing hosts which is all too often caused by
throwing usable IP addresses away.

> :The central issue however is (just as in kindergarten) discipline.
> :Few clients (if any) use the caches truly as bootstrap system. Most
> :of them fall back to them far too easily and do not restrict
> :themselves to an acceptable amount of requests. You can only fix
> :your own software, talking to other developers is (usually) as
> :satisfying as a discussion with a brick wall.

> It's true that some clients are abusing the caches.  However, I suspect
> only GTKG has such a large host cache.

This doesn't help much when it's banging at away 40 connection attempts
per second especially when it's running unattented. When the router
craps out and you're offline for a while - which doesn't seem horribly
unlikely - the cache gets empty no matter what. Yes, it'll detect
being offline but if the connection is flapping that fails.

> Other clients cache only a low
> amount (100 or so) IP addresses and so it's entirely possible that those
> addresses no longer work when the client is stopped for some time.

But they probably check the uptime of those hosts. Something GTKG
doesn't do.
 
> :Now something which is also very important: No system will ever
> :scale if clients fall back to *all* caches they know about. Basically,
> :if you there are 10000 caches clients would bang 10000 caches instead
> :of 20. Ok, the time necessary to connect to all of them would buy
> :you some delay but in the end it still sucks.

> What matters is the quality of the data in the caches.  Apart from badly
> written clients, noone would do that if they connect all right at the
> first GWC connection.

Yes, maybe *if* they connect alright. If they don't they'll keep
contacting all caches they know about - probably forever and ever
if not shut down somewhen. GTKG does exactly the same. Just block
UDP and set tls_enforce to TRUE in order to simulate the BISPFH.
After about 30 minutes, GTKG will have tried all UHCs thrice and
all GWebCaches. Keep in mind that just because you don't see any
real traffic your UHC requests might very have hit the UHCs and
the GWebCaches had been hit eventhough you just saw a hang up.

And exactly this happens when you use a Gnutella peer behind
an .edu firewall. They use L7 filters, that's absolutely obvious
from the symptoms. Of course, any badly configured firewall
can have the same or similar effects.
 
> :Therefore, clients must really *give up* *xor* use sufficiently large
> :*exponential* delays when contacting caches. That may sound simple
> :but how do you teach a brick?

> Exponential is good up to a certain point.  The real issue here is that
> the first cache contacted should allow the client to bootstrap.  No sane
> client should contact more than a few caches, and they have to do that
> because data quality in some caches is poor.

Well but as I wrote. If you contact all of them because none seems
to work you effectively hit them all and the system does not scale
at all. That's as if all TCP/IP host were configured to fall back
to the root DNS servers if a resolution fails. Of course that's not
the case, you only use 1-3 public/private DNS servers and the
rest is handled by a hierarchy of cascaded caches.

> If people can't bootstrap easily, they will go away from Gnutella.
> I'd say if after 10 minutes they're not connected, they'll switch to
> some other network after a first try.

If the caches are overloaded there's nothing to bootstrap anyway.
They may as well go away before the network does. You can't enforce
plug and play. If it doesn't connect, let them figure out how to
fix and retry then. Retrying aggressively doesn't get you anywhere
and may even cause trouble for the network as a whole.

> So setting up timeouts of 24 hours is not doing any good to that.

Either you or me must be misunderstanding this 24 hour timeout. This
becomes only completely effective after Gtk-Gnutella contact all
known UHCs 3 times. What magic do you expect to happen? Sure, this
might be overkill in some rare cases but then you can simply restart
Gtk-Gnutella to get a next round. Maybe you should (have) check(ed)
main.log more often. There are clearly a couply of GTKGs that
have problems getting connected (presumably due to a mismatch of
extern/intern port) and ping the UHCs over and over again. Fortunately,
the number of GTKG users and GTKG users affected by such problems
is sufficiently low for now. However, the caches are almost on
their knees already when Gnutella is still supposed to grow.

Also again and again it's called "bootstrap". What business does
any client have with those caches *hours* after you started it and
after it actually was connected to several of peers? There's
really no point to contact them ever again once you found a single
running peer. Clients should reconnect for more X-Try-* or better
yet re-ping peers over UDP using GGEP SCP but they must not contact
a cache again. And please don't tell me you're worried about
clients getting hit too often then. Obviously few care how often
a cache gets hit (in the worst case).

GTKG is far too trigger-happy with banning "unstable" IPs and throw
addresses away because some arcane vendor limit is reached. I mean
it can really cause any damage it likes to itself but this must
not be achieved by utilizing and effectively wasting cache resources.
UHC also means that you ping peers from your pool for more addresses;
GTKG just throws them away for no reason!

We can all happily blame other clients for being 100x worse but
that doesn't really help anyone. Oh and I don't write this elsewhere
because I've already wrote most of it several times and I know
that those vendors don't give a damn or just don't grasp it.

-- 
Christian

pgplia9iTu4s2.pgp
Description: PGP signature

Re: [Gtk-gnutella-devel]Re: Call for idea: GTKG-specific bootstrap protocol

Reply via email to