Oh I am sorry… I find that Blur Warmup code has been completely removed in repository now. Are there reasons for doing the same?
I thought auto-save/loading of block-cache can benefit from this nicely -- Ravi On Tue, Apr 21, 2015 at 6:06 PM, Ravikumar Govindarajan < [email protected]> wrote: > Was just looking at Blur warmup logic. I could classify in 2 stages... > > StageI > Looks like openShard [DistributedIndexServer] submits the warmup request > on a separate warmupExecutor. This is what is exactly needed for loading > auto-saved block-cache from HDFS... > > StageII > But when I prodded a little bit deeper, it got complex. > TraceableDirectory, IndexTracer with thread-local stuff etc… I could not > follow the code... > > I decided on an impl as follows... > > public class BlockCacheWarmup extends BlurIndexWarmup { > > @override > > public void warmBlurIndex(final TableDescriptor table, final String > shard, IndexReader reader, > > AtomicBoolean isClosed, ReleaseReader releaseReader, AtomicLong > pauseWarmup) throws IOException { > for (each segment) { > for (each file) { > //Read cache-meta data from HDFS... > //Directly open CacheIndexInput and populate block-cache > } > } > } > > My qstn is… > > If I explicitly bypass StageII {TraceableDirectory and friends} and just > populate block-cache alone, will this work-fine. Am I missing something > obvious? > > Any help is much appreciated… > > -- > Ravi > > On Fri, Feb 6, 2015 at 2:58 PM, Aaron McCurry <[email protected]> wrote: > >> Yes exactly. That way we could provide a set of blocks to be cache with >> priority, so the most important bits get cached first. >> >> Aaron >> >> On Fri, Feb 6, 2015 at 12:43 AM, Ravikumar Govindarajan < >> [email protected]> wrote: >> >> > That's a great idea... >> > >> > You meant like instead of saving blocks themselves, we can store >> metadata >> > {block-ids} for each file/shard in HDFS that is written to >> block-cache... >> > >> > Opening a shard can then use this metadata to re-populate the hot parts >> of >> > the files... >> > >> > We also need to handle evictions & file-deletes... >> > >> > Is this what you are hinting at? >> > >> > -- >> > Ravi >> > >> > On Thu, Feb 5, 2015 at 7:03 PM, Aaron McCurry <[email protected]> >> wrote: >> > >> > > On Thu, Feb 5, 2015 at 6:30 AM, Ravikumar Govindarajan < >> > > [email protected]> wrote: >> > > >> > > > I noticed in BigTable impl of Cassandra where they store the >> "Memtable" >> > > > info periodically onto disk to avoid cold start-ups... >> > > > >> > > > Is it possible to do something like that for Blur's block-cache, >> > > preferably >> > > > in HDFS itself so that both cold start-ups and shard take-overs >> don't >> > > > affect end-user latencies... >> > > > >> > > > In Cassandra's case, the size of Memtable will typically be 2GB-4GB. >> > But >> > > in >> > > > case of Blur, it could even be100 GB. So I don't know if attempting >> > such >> > > > stuff is good idea. >> > > > >> > > > Any help is appreciated much... >> > > > >> > > >> > > Yeah I agree that caches could be very large and storing in HDFS >> could be >> > > counter productive. Also the block cache represents what is on the >> > single >> > > node and it's not really broken up by shard or table. So if a node >> was >> > > restarted without a full cluster restart there's no guarantee that the >> > > shard server will get the same shards back that it was serving before. >> > > >> > > I like the idea though, perhaps we can write out what parts of what >> files >> > > the cache was storing with the lru order. Then any server that is >> > opening >> > > the shard can know what parts of what files were hot the last time it >> was >> > > open. Then they could choose to populate the cache upon shard >> opening. >> > > >> > > Thoughts? >> > > >> > > Aaron >> > > >> > > >> > > > >> > > > -- >> > > > Ravi >> > > > >> > > >> > >> > >
