The paper mentions how they selectively replicate different subsets of the 
data. They use 'china queries' or somesuch as their example.

my understanding is that there is some kind of query/subset monitor that 
detects hot spots, and then increases the replication count of them across the 
farm. It must also be responsible for decreasing the count as the hotspots 
become cool again.

regards
Ian
On Sep 12, 2012, at 12:31 PM, Ted Dunning <[email protected]> wrote:

> What do you mean be selective replication?
> 
> On Tue, Sep 11, 2012 at 7:23 PM, Worthy LaFollette <[email protected]>wrote:
> 
>> Very good paper. Am curious now to the strategies for selective
>> replication, which looks if done right would make the query generation more
>> efficient.  Do you know of any papers on that subject?
>> 
>> On Tue, Sep 11, 2012 at 1:37 PM, Ted Dunning <[email protected]>
>> wrote:
>> 
>>> Headed into Thursday's meetup, this paper by Jeff Dean provides a very
>> good
>>> description of strategies for getting fast response times with variable
>>> quality infrastructure.
>>> 
>>> http://research.google.com/people/jeff/latency.html
>>> 
>>> The key point here is that it is very important to have asynchronous
>>> queries with a cancel.  Above that level, there needs to be a simple
>>> strategy for pushing second versions of queries out to the workers and
>>> canceling defunct or redundant queries.
>>> 
>> 

--
Ian Holsman
[email protected]
http://doitwithdata.com.au
PH: +61-400-988-964 Skype:iholsman


Reply via email to