On Nov 30 2007, Matthew Toseland wrote:
> Increasing MAX_PING_TIME would have no effect, for example, because most 
> nodes mostly reject on bandwidth liability.

MAX_PING_TIME was just an example - my point is that if we know most nodes 
aren't using the available bandwidth, we should tweak the rejection 
thresholds until most nodes hit their bandwidth limits. That doesn't 
require any new algorithms, just tuning the constants of the existing ones.

> But the point I am making is 
> *we don't even limit effectively on bandwidth liability* : busy-looping 
> until a request gets through shouldRejectRequest() improves performance 
> significantly, therefore backoff and AIMD is not supplying enough 
> requests to the front end of the current load limiting system.

To play devil's advocate for a minute: maybe it only improves performance 
because we're hammering our peers with so many requests that probabilistic 
rejection is effectively circumvented (sooner or later the coin will come 
up heads). This isn't necessarily a good strategy.

I'm not opposed to disabling AIMD and replacing backoff with explicit 
"start/stop" signals, I'm just not convinced it will fix anything either.

> Yes. Well really it's a form of token passing, but I'm trying to make it 
> simple and obviously correct.

It's not really token passing - a peer that receives the "start" signal can 
send unlimited requests until it receives the "stop" signal (pre-emptive 
rejection). With token passing the peer knows how many requests it can 
send, so there's no need for pre-emptive rejection.

That's not to say that I think token passing is better than your proposal - 
we never settled the question of how many tokens to hand out or how to 
allocate them, for example. A simple solution is definitely preferable. 
However, there's a reason most protocols don't use simple start/stop flow 
control: it's hard to get good performance because the peer's response is 
delayed by one RTT and you can't make smooth adjustments (it's all or 
nothing).

To be honest I think we're just trying to compensate for a broken transport 
layer. Look at the way HTTP handles flow control: it doesn't. Flow control 
is left to the transport layer. Requests can be pipelined; if you're busy 
processing the last request, don't read another one from the socket. To 
handle timeouts, add a timestamp to the request and skip it if the 
timestamp indicates that the previous hop will have timed out and moved on.

> We are not talking about the same queue. Local requests from the 
> various client-layer queues have to go through exactly the same process.

Sorry, I realised that after sending the message. :-)

> I mean that requests queued may not be successfully forwarded because 
> they are too far away from any of our peers' locations, yet since they 
> don't go away, our peers cannot send us any more requests which are 
> closer to the target. I believe what I said about this is sufficient.

I must have missed something - does the twice-the-median limit only apply 
to misrouted requests? If it applies to all requests, then either we can 
send the head of the queue to *someone* or we can't send anything to 
anyone. Either way there's no way for a "bad" request to block a "good" 
request.

Cheers,
Michael

Reply via email to