That's a great idea...
You meant like instead of saving blocks themselves, we can store metadata
{block-ids} for each file/shard in HDFS that is written to block-cache...
Opening a shard can then use this metadata to re-populate the hot parts of
the files...
We also need to handle evictions & file-deletes...
Is this what you are hinting at?
--
Ravi
On Thu, Feb 5, 2015 at 7:03 PM, Aaron McCurry <[email protected]> wrote:
> On Thu, Feb 5, 2015 at 6:30 AM, Ravikumar Govindarajan <
> [email protected]> wrote:
>
> > I noticed in BigTable impl of Cassandra where they store the "Memtable"
> > info periodically onto disk to avoid cold start-ups...
> >
> > Is it possible to do something like that for Blur's block-cache,
> preferably
> > in HDFS itself so that both cold start-ups and shard take-overs don't
> > affect end-user latencies...
> >
> > In Cassandra's case, the size of Memtable will typically be 2GB-4GB. But
> in
> > case of Blur, it could even be100 GB. So I don't know if attempting such
> > stuff is good idea.
> >
> > Any help is appreciated much...
> >
>
> Yeah I agree that caches could be very large and storing in HDFS could be
> counter productive. Also the block cache represents what is on the single
> node and it's not really broken up by shard or table. So if a node was
> restarted without a full cluster restart there's no guarantee that the
> shard server will get the same shards back that it was serving before.
>
> I like the idea though, perhaps we can write out what parts of what files
> the cache was storing with the lru order. Then any server that is opening
> the shard can know what parts of what files were hot the last time it was
> open. Then they could choose to populate the cache upon shard opening.
>
> Thoughts?
>
> Aaron
>
>
> >
> > --
> > Ravi
> >
>