Re: CacheIndexInput cacheSize

Ravikumar Govindarajan Fri, 02 Dec 2016 05:11:12 -0800

Ok, I forgot to include this link. Don't know what version Cloudera is on
w.r.t BlockCache, but they are claiming that using it during merges results
in critical-section (allocation) lock causing meltdown...


https://blog.cloudera.com/blog/2016/08/resolving-lock-contention-in-apache-solr-a-performance-analysis-detective-story/


Will this hold good for the latest BlockCache version of Blur too?

On Fri, Dec 2, 2016 at 6:20 PM, Ravikumar Govindarajan <
[email protected]> wrote:

> One thing I was wondering is, does block-cache acquire locks of any kind
> during reads?
>
> I don't use the 'read-then-cache' construct at all, so was just thinking
> if it is fine to eliminate locks (if any) on the read path
>
>
> On Mon, Oct 24, 2016 at 7:07 PM, Aaron McCurry <[email protected]> wrote:
>
>> On Fri, Oct 21, 2016 at 1:41 AM, Ravikumar Govindarajan <
>> [email protected]> wrote:
>>
>> > Our application makes use of 'write-thru-block-cache' only. During
>> > search/merge-reads, we have modified block-cache code to only probe the
>> > block-cache and avoid inserting to it.
>> >
>> > In such a usage scenario, I was thinking about introducing a
>> > 'readBufferSize'  (default=1KB) in CacheIndexInput. From block-cache or
>> > underlying file we read only 'readBufferSize' data & adjust counters
>> > accordingly when it's a short-circuit read...
>> >
>> > You think it could be made workable?
>> >
>>
>> Yeah it should be.
>>
>>
>> >
>> > Another idea could be to bypass the cache directory during merges and
>> read
>> > > directly from the hdfsdirectory.  Then perhaps you could take
>> advantage
>> > of
>> > > the SC reads without having to deal with the cache directly.
>> >
>> >
>> > This is what we are currently evaluating & it looks to be a safe bet
>> >
>>
>> Ok, let me know if you have any questions.
>>
>>
>> >
>> > --
>> > Ravi
>> >
>> > On Fri, Oct 21, 2016 at 3:26 AM, Aaron McCurry <[email protected]>
>> wrote:
>> >
>> > > I my experience I too have used block cache sizes in the 64KB range
>> for
>> > the
>> > > same reasons you listed.  The biggest of which was because we were
>> > running
>> > > upwards of 100GB caches and 1K block cache sizes are not really
>> possible
>> > at
>> > > that size.  The biggest probably with the compaction is with the .tim
>> > file,
>> > > the rest of the files are mostly sequential reads, but because that
>> file
>> > is
>> > > a tree it tends to jump all over the place during compaction.  I would
>> > > recommend if you want to speed up compaction (merges) to allow the tim
>> > > files to be put into block cache during the merge (e.i. turn quiet
>> reads
>> > > off for those files).  This of course could flow your cache with data
>> > that
>> > > you are about to remove, so if you have the cache space it's the
>> easiest
>> > > solution.
>> > >
>> > > Another idea could be to bypass the cache directory during merges and
>> > read
>> > > directly from the hdfsdirectory.  Then perhaps you could take
>> advantage
>> > of
>> > > the SC reads without having to deal with the cache directly.
>> > >
>> > > Aaron
>> > >
>> > > On Thu, Oct 20, 2016 at 3:53 AM, Ravikumar Govindarajan <
>> > > [email protected]> wrote:
>> > >
>> > > > We have set a fairly large cacheSize of 64KB in block-cache for
>> > avoiding
>> > > > too many keys, gc pressure etc...
>> > > >
>> > > > But CacheIndexInput tries to read 64KB of data during a cache-miss &
>> > > fills
>> > > > up the CacheValue. When doing short-circuit-reads, this could turn
>> out
>> > to
>> > > > be excessive no? For a comparison, lucene uses only 1KB buffers for
>> the
>> > > > same..
>> > > >
>> > > > Do you think this will likely affect performance of searches albeit
>> in
>> > a
>> > > > minor way?
>> > > >
>> > > > --
>> > > > Ravi
>> > > >
>> > >
>> >
>>
>
>

Re: CacheIndexInput cacheSize

Reply via email to