Re: [Bloat] Detecting bufferbloat from outside a node

Neil Davies Mon, 04 May 2015 03:21:18 -0700

On 4 May 2015, at 11:10, Paolo Valente <[email protected]> wrote:


> I have tried to fully digest this information (thanks), but there is still 
> some piece that I am missing. To highlight it, I would like to try with an 
> oversimplified example. I hope this will make it easier to point out flaws in 
> my understanding.
> 
> Suppose that one wants/needs to discover whether outbound and/or inbound 
> packets experience high, internal queueing delays, in a given node A, because 
> some buffers are bloated (inside the node). For any packet leaving or 
> entering the node, we have that, regardless of whether the packet exits from 
> the node after experiencing a high internal output-queueing delay, or whether 
> the packet will experience a high internal input-queueing delay after being 
> received by node, the per-hop or end-to-end delays experienced by the packet 
> outside the node are exactly the same. If this statement is true, then, since 
> no information of any sort is available about queueing delays inside the 
> node, and since the delays measurable from outside the node are invariant 
> with respect to the internal queueing delays, how can we deduce internal 
> delays from external ones?

Paolo, 

as you surmise - without appropriate observation points you can’t 
*definitively* isolate where the “delay” (really ∆Q) is accruing. If you can 
construct time-traces at more than one observation point you can isolate it to 
between observation points.

If you have a model of how it is supposed to behave (i.e. have knowledge about 
the intermediate network elements along with some idea of their configuration) 
you can start (by observing the pattern of change in the delay/loss) to make 
inferences that certain elements are being driven in a certain way - but 
beware, the model of inference is key - use of “bandwidth” or even just “delay” 
doesn’t work - it does appear to work if you use the ∆Q|G,S,V as the basis set

Neil

> 
> Thanks,
> Paolo
> 
> Il giorno 28/apr/2015, alle ore 12:23, Neil Davies <[email protected]> ha 
> scritto:
> 
>> 
>> On 28 Apr 2015, at 10:58, Sebastian Moeller <[email protected]> wrote:
>> 
>>> Hi Neil,
>>> 
>>> 
>>> On Apr 28, 2015, at 09:17 , Neil Davies <[email protected]> wrote:
>>> 
>>>> Jonathan
>>>> 
>>>> The timestamps don't change very quickly - dozens (or more) of packets can 
>>>> have the same timestamp, so it doesn't give you the appropriate 
>>>> discrimination power. Timed observations at key points gives you all you 
>>>> need (actually, appropriately gathered they give you all you can possibly 
>>>> know - by observation)
>>> 
>>>     But this has two issues:
>>> 1) “timed observations”: relatively easy if all nodes are under your 
>>> control otherwise hard. I know about the CERN paper, but they had all nodes 
>>> under their control, symmetric bandwidth and shipload of samples, so over 
>>> the wild internet “timed observations” are still hard (and harder as the 
>>> temporal precision requirement goes up)
>> 
>> ∆Q (with its improper CDF semantics and G,S and V basis set) has composition 
>> and de-composisition properties - this means that you don’t need to be able 
>> to observe everywhere - even in Lucian’s case his observation points were 
>> limited (certain systems) - the rest of the analysis is derived using the 
>> properies of the ∆Q calculus.
>> 
>> Lucian also demonstrated how the standard timing observations (which include 
>> issues of clock drift and distributed accuracy) can be resolved in a 
>> practical situation - he reproduced - starting from libpcap captures on 
>> machines - results that CERN guys build specialist h/w with better than 20ns 
>> timing only 5 years before.
>> 
>> The good thing about Lucian’s thesis is that it is in the public domain - 
>> but we use the same approach over wide (i.e world) networks and get same 
>> properties (unfortunately that is done in a commercial context). This all 
>> arises because we can perform the appropriate measurement error analysis, 
>> and hence use standard statistical techniques.
>> 
>>> 
>>> 2) “key points”: once you know the key points you already must have a 
>>> decent understanding on the effective topology of the network, which again 
>>> over the wider internet is much harder than if one has all nodes under 
>>> control.
>> 
>> Not really - the key points (as a start) are the end ones - and those you 
>> have (reasonable) access to - and even if you don’t have access to the 
>> *actual* end points - you can easily spin up a measurement point that is 
>> very close (in ∆Q terms) to the ones you are interested in - AWS and Google 
>> Compute are your friends here.
>> 
>>> 
>>> 
>>> I am not sure how Paolo’s “no-touching” problem fits into the requirements 
>>> for your deltaQ (meta-)math ;)
>> 
>> I see “no touching” as “no modification” - you can’t deduce information in 
>> the absence of data - what you need to understand is the minimum data 
>> requirements to achieve the measurement outcome - ∆Q calculus gives you that 
>> handle.
>> 
>>> 
>>> Best Regards
>>>     Sebastian
>>> 
>>>> 
>>>> Neil
>>>> 
>>>> On 28 Apr 2015, at 00:11, Jonathan Morton <[email protected]> wrote:
>>>> 
>>>>> On 27 Apr 2015 23:31, "Neil Davies" <[email protected]> wrote:
>>>>>> 
>>>>>> Hi Jonathan
>>>>>> 
>>>>>> On 27 Apr 2015, at 16:25, Jonathan Morton <[email protected]> wrote:
>>>>>> 
>>>>>>> One thing that might help you here is the TCP Timestamps option. The 
>>>>>>> timestamps thus produced are opaque, but you can observe them and 
>>>>>>> measure the time intervals between their production and echo. You 
>>>>>>> should be able to infer something from that, with care.
>>>>>>> 
>>>>>>> To determine the difference between loaded and unloaded states, you may 
>>>>>>> need to observe for an extended period of time. Eventually you'll 
>>>>>>> observe some sort of bulk flow, even if it's just a software update 
>>>>>>> cycle. It's not quite so certain that you'll observe an idle state, but 
>>>>>>> it is sufficient to observe an instance of the link not being 
>>>>>>> completely saturated, which is likely to occur at least occasionally.
>>>>>>> 
>>>>>>> - Jonathan Morton
>>>>>> 
>>>>>> We looked at using TCP timestamps early on in our work. The problem is 
>>>>>> that they don't really help extract the fine-grained information needed. 
>>>>>> The timestamps can move in very large steps, and the accuracy (and 
>>>>>> precision) can vary widely from implementation to implementation.
>>>>> 
>>>>> Well, that's why you have to treat them as opaque, just like I said. 
>>>>> Ignore whatever meaning the end host producing them might embed in them, 
>>>>> and simply watch which ones get echoed back and when. You only have to 
>>>>> rely on the resolution of your own clocks.
>>>>> 
>>>>>> The timestamps are there to try and get a gross (if my memory serves me 
>>>>>> right ~100ms) approximation to the RTT - not good enough for reasoning 
>>>>>> about TCP based interactive/"real time" apps
>>>>> 
>>>>> On the contrary, these timestamps can indicate much better precision than 
>>>>> that; in particular they indicate an upper bound on the instantaneous RTT 
>>>>> which can be quite tight under favourable circumstances. On a LAN, you 
>>>>> could reliably determine that the RTT was below 1ms this way.
>>>>> 
>>>>> Now, what it doesn't give you is a strict lower bound. But you can often 
>>>>> look at what's going on in that TCP stream and determine that favourable 
>>>>> circumstances exist, such that the upper bound RTT estimate is probably 
>>>>> reasonably tight. Or you could observe that the stream is mostly idle, 
>>>>> and thus probably influenced by delayed acks and Nagle's algorithm, and 
>>>>> discount that measurement accordingly.
>>>>> 
>>>>> - Jonathan Morton
>>>>> 
>>>> 
>>>> _______________________________________________
>>>> Bloat mailing list
>>>> [email protected]
>>>> https://lists.bufferbloat.net/listinfo/bloat
>>> 
>> 
>> _______________________________________________
>> Bloat mailing list
>> [email protected]
>> https://lists.bufferbloat.net/listinfo/bloat
> 
> 
> --
> Paolo Valente                                                 
> Algogroup
> Dipartimento di Fisica, Informatica e Matematica              
> Via Campi, 213/B
> 41125 Modena - Italy                                    
> homepage:  http://algogroup.unimore.it/people/paolo/
> 

_______________________________________________
Bloat mailing list
[email protected]
https://lists.bufferbloat.net/listinfo/bloat

Re: [Bloat] Detecting bufferbloat from outside a node

Reply via email to