The paper mentions how they selectively replicate different subsets of the data. They use 'china queries' or somesuch as their example.
my understanding is that there is some kind of query/subset monitor that detects hot spots, and then increases the replication count of them across the farm. It must also be responsible for decreasing the count as the hotspots become cool again. regards Ian On Sep 12, 2012, at 12:31 PM, Ted Dunning <[email protected]> wrote: > What do you mean be selective replication? > > On Tue, Sep 11, 2012 at 7:23 PM, Worthy LaFollette <[email protected]>wrote: > >> Very good paper. Am curious now to the strategies for selective >> replication, which looks if done right would make the query generation more >> efficient. Do you know of any papers on that subject? >> >> On Tue, Sep 11, 2012 at 1:37 PM, Ted Dunning <[email protected]> >> wrote: >> >>> Headed into Thursday's meetup, this paper by Jeff Dean provides a very >> good >>> description of strategies for getting fast response times with variable >>> quality infrastructure. >>> >>> http://research.google.com/people/jeff/latency.html >>> >>> The key point here is that it is very important to have asynchronous >>> queries with a cancel. Above that level, there needs to be a simple >>> strategy for pushing second versions of queries out to the workers and >>> canceling defunct or redundant queries. >>> >> -- Ian Holsman [email protected] http://doitwithdata.com.au PH: +61-400-988-964 Skype:iholsman
