Thanks Michaël

I may have been a little muffled in the podcast but I was saying 
"responsive" rather than "reactive". You are correct below in that the 
average wait time is what reduces by ~20X as a result of halving service 
time when utilisation is 90%, therefore the system offering the service is 
more responsive to user requests.

This is why utilisation plays such an important role in systems performance 
as it impacts response time or latency. This applies to all computing 
resources. The higher the utilisation then the higher the probability it 
will be in use and thus queues can form. This can been seen with CPUs when 
you have more threads requiring a CPU to run on than CPUs available. The 
model with CPUs is more complicated when preemption and priority comes into 
play.

The cloud model is about cost reduction by increasing utilisation. We can 
see VMs get sluggish on over subscribed systems. If you want predictable 
response times then you need dedicated resource and low utilisation but 
this comes at a cost. It is an engineering trade off.

The really interesting question is what time interval is the average 
utilisation measured over. In financial trading to measure averages in 
units of seconds for utilisation is pointless. The really interesting 
intervals are burst of a few microseconds. It is the average utilisation in 
those bursts which makes the difference and why batching is a key technique 
to reduce utilisation by amortising costs. Creating a batch of 2 can half 
the utilisation.


Martin...

On Saturday, 28 September 2019 14:22:12 UTC+1, Michaël REMOND wrote:
>
> Hello,
>
> I know about Martin Thompson excellent work since a good time now, and 
> recently I wanted to better understand some queuing theory he discussed in 
> the Arrested Devops podcast.
>
> He gave the example of a service having an average response time of 100ms, 
> and this service receives on average 9 requests/second.
> If the service response time is divided by 2 (50 ms), Martin said that the 
> service becomes 20x times more "reactive".
>
> I was not sure what he meant exactly by reactive, but I tried to 
> understand where these 20x came from.
>
> Here is what I came to, and I wanted that some experienced people confirm 
> to me if my reasoning is correct or not.
>
> I assume that Martin is considering the M/M/1 queue model. Then the theory 
> says that on average, the *waiting time* (not the sojourn time) is: *ρ*/(
> *μ* − *λ*).
> With the real numbers, we have:
>  - case 1 (service time = 100ms): waiting time = (9/10) / (10 - 9) = 0.9s 
> = 900ms
>  - case 2 (service time = 50ms): waiting time = (9/20) / (20 - 9) = 0.04s 
> = 40ms
>
> And then 900 / 40 ≈ 22. So I think that when Martin is saying that the 
> system is more reactive, he is talking about the waiting time.
>
> So the first take-away for me, is that our systems should not run at a 
> utilization bigger than 80% (see 
> https://www.johndcook.com/blog/2009/01/30/server-utilization-joel-on-queuing/#comment-13511
> ).
>
> So my next question is: should I consider this result true for our CPUs 
> and memory as well?
>
> I am particularly interested in this question, because I read in the 
> Google SRE Book that for optimal resource utilization (and thereafter for 
> optimal costs), they try to make their CPUs almost always full of tasks.
>
> Thank you in advance for your responses and thoughts on this subject.
>
> Michaël
>

-- 
You received this message because you are subscribed to the Google Groups 
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to mechanical-sympathy+unsubscr...@googlegroups.com.
To view this discussion on the web, visit 
https://groups.google.com/d/msgid/mechanical-sympathy/57ba4ecf-066e-48aa-a8b7-3c2f7175ae5f%40googlegroups.com.

Reply via email to