I would like to point out that neither of these options (both good) would
affect query processing because replication is far too slow to help at
query time.

In another life, I found that we could predict popularity of video items
using only the very early life history of the items.  Similarly, I have had
good success predicting first weekend and total life-time revenue for a
movie based on the first 3 hours on opening night.  These are very
different domains, but I would think that data assets might be subject to
the same flash crowd effects and thus be somewhat predictable given early
interest.

Seasonality and similar effects are also clearly visible in real customers.
 For instance, it is common for traffic summaries to be very popular for
the first week and then have a popularity bump on the month, quarter and
annual anniversaries.

On Tue, Sep 11, 2012 at 9:46 PM, Ian Holsman <[email protected]> wrote:

> I don't know of any papers off hand, but I would think you could go down
> two routes. A predictive trend algo to 'guess' which blocks could get hot
> based on seasonal traffic and a reactive one based on response time
> regularized by #replicas it is on.
>
> Sent from my iPhone
>
> On 12/09/2012, at 2:21 PM, Worthy LaFollette <[email protected]> wrote:
>
> > As Ian explained down thread, the paper gave two examples.  The first was
> > static seeding of duplicates, the second was dynamic with a suggestion
> of a
> > monitor which seeds additional copies based on some algorithm in response
> > to "hot" queries (China being the topic of the example given).  I am
> > curious if anyone was aware of any papers about this second part.  I can
> > almost see a cost model where the query measures the overall cost of a
> > query (latency, risk of latency?) and then generates copies in response.
> > Part of this of course would be a recovery mechanism which removes these
> > extra copies.
> >
> > W-
> >
> > On Tue, Sep 11, 2012 at 9:31 PM, Ted Dunning <[email protected]>
> wrote:
> >
> >> What do you mean be selective replication?
> >>
> >> On Tue, Sep 11, 2012 at 7:23 PM, Worthy LaFollette <[email protected]
> >>> wrote:
> >>
> >>> Very good paper. Am curious now to the strategies for selective
> >>> replication, which looks if done right would make the query generation
> >> more
> >>> efficient.  Do you know of any papers on that subject?
> >>>
> >>> On Tue, Sep 11, 2012 at 1:37 PM, Ted Dunning <[email protected]>
> >>> wrote:
> >>>
> >>>> Headed into Thursday's meetup, this paper by Jeff Dean provides a very
> >>> good
> >>>> description of strategies for getting fast response times with
> variable
> >>>> quality infrastructure.
> >>>>
> >>>> http://research.google.com/people/jeff/latency.html
> >>>>
> >>>> The key point here is that it is very important to have asynchronous
> >>>> queries with a cancel.  Above that level, there needs to be a simple
> >>>> strategy for pushing second versions of queries out to the workers and
> >>>> canceling defunct or redundant queries.
> >>
>

Reply via email to