You're absolutely correct. My point was that even much "less" than 2 can be sufficient.
On Thu, Sep 13, 2012 at 1:40 AM, Ted Dunning <[email protected]> wrote: > It isn't a doubling. It is a power. > > If probability of exceeding the SLA is p, then the probability that two > independent resources will exceed the SLA is p^2. For three, the > probability is p^3. > > To be concrete, I just did a simulation with a mixture of two log-normal > distributions. Using a mixture distribution here is important to emulate > the long-tailed nature of response time distributions ... it doesn't > suffice to use normal distributions. > > With a long tailed distribution that has a median of 20 ms response, the > raw distribution has about a 2% chance of having a response > 50ms. Using > the lesser of two responses gives a probability of > 50 ms response if > 0.04%. Three responses gives a probability of 0.0008%. For most > applications, the difference between 2 and 3 replicated queries is nil. > > Moreover, if the second query has an artificial delay of a few ms, you get > nearly the same improvements in probability of meeting the SLA, but you pay > much lower average cost because you rarely invoke the redundant queries. > > So the reason that 2 are used instead of 3 is that 2 helps a lot while 3 > only improves things slightly more. > > On Wed, Sep 12, 2012 at 1:01 PM, Constantine Peresypkin < > [email protected]> wrote: > > > If you do a double query you're increasing your chances to success by > > factor of 2 only. > > Why not triple or quadruple? > > > > On Wed, Sep 12, 2012 at 10:14 PM, Ted Dunning <[email protected]> > > wrote: > > > > > Heavens.... we can easily satisfy both needs. > > > > > > Just have a parameter that can be set to 0 (= universal double query) > or > > > Integer.MAX_INTEGER to get no backups at all. > > > > > > On Wed, Sep 12, 2012 at 11:47 AM, Constantine Peresypkin < > > > [email protected]> wrote: > > > > > > > > The PowerDrill paper also mentions a variant of this where each > query > > > > fragment is sent to two machines, and the results for that fragment > are > > > > used from whatever machine responds first. > > > > > > > > > > > > To send each query or request twice cluster load will be increased by > > > 100%. > > > > > > > > > >
