Thanks Michaël I may have been a little muffled in the podcast but I was saying "responsive" rather than "reactive". You are correct below in that the average wait time is what reduces by ~20X as a result of halving service time when utilisation is 90%, therefore the system offering the service is more responsive to user requests.
This is why utilisation plays such an important role in systems performance as it impacts response time or latency. This applies to all computing resources. The higher the utilisation then the higher the probability it will be in use and thus queues can form. This can been seen with CPUs when you have more threads requiring a CPU to run on than CPUs available. The model with CPUs is more complicated when preemption and priority comes into play. The cloud model is about cost reduction by increasing utilisation. We can see VMs get sluggish on over subscribed systems. If you want predictable response times then you need dedicated resource and low utilisation but this comes at a cost. It is an engineering trade off. The really interesting question is what time interval is the average utilisation measured over. In financial trading to measure averages in units of seconds for utilisation is pointless. The really interesting intervals are burst of a few microseconds. It is the average utilisation in those bursts which makes the difference and why batching is a key technique to reduce utilisation by amortising costs. Creating a batch of 2 can half the utilisation. Martin... On Saturday, 28 September 2019 14:22:12 UTC+1, Michaël REMOND wrote: > > Hello, > > I know about Martin Thompson excellent work since a good time now, and > recently I wanted to better understand some queuing theory he discussed in > the Arrested Devops podcast. > > He gave the example of a service having an average response time of 100ms, > and this service receives on average 9 requests/second. > If the service response time is divided by 2 (50 ms), Martin said that the > service becomes 20x times more "reactive". > > I was not sure what he meant exactly by reactive, but I tried to > understand where these 20x came from. > > Here is what I came to, and I wanted that some experienced people confirm > to me if my reasoning is correct or not. > > I assume that Martin is considering the M/M/1 queue model. Then the theory > says that on average, the *waiting time* (not the sojourn time) is: *ρ*/( > *μ* − *λ*). > With the real numbers, we have: > - case 1 (service time = 100ms): waiting time = (9/10) / (10 - 9) = 0.9s > = 900ms > - case 2 (service time = 50ms): waiting time = (9/20) / (20 - 9) = 0.04s > = 40ms > > And then 900 / 40 ≈ 22. So I think that when Martin is saying that the > system is more reactive, he is talking about the waiting time. > > So the first take-away for me, is that our systems should not run at a > utilization bigger than 80% (see > https://www.johndcook.com/blog/2009/01/30/server-utilization-joel-on-queuing/#comment-13511 > ). > > So my next question is: should I consider this result true for our CPUs > and memory as well? > > I am particularly interested in this question, because I read in the > Google SRE Book that for optimal resource utilization (and thereafter for > optimal costs), they try to make their CPUs almost always full of tasks. > > Thank you in advance for your responses and thoughts on this subject. > > Michaël > -- You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group. To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsubscr...@googlegroups.com. To view this discussion on the web, visit https://groups.google.com/d/msgid/mechanical-sympathy/57ba4ecf-066e-48aa-a8b7-3c2f7175ae5f%40googlegroups.com.