Re: CacheIndexInput cacheSize

Ravikumar Govindarajan Thu, 08 Dec 2016 22:38:00 -0800

Thanks for the clarifications Aaron

On Thu, Dec 8, 2016 at 2:53 AM, Aaron McCurry <[email protected]> wrote:


> Solr uses the original block cache that was created in Blur.  As for
> locking, the only locking code in the read path should be in the cache map
> itself and in the HDFS client code.  I believe both have some form of java
> locks, likely the HDFS client will be far worse for performance.  The block
> cache itself should be lock free.
>
> Aaron
>
> On Fri, Dec 2, 2016 at 8:10 AM, Ravikumar Govindarajan <
> [email protected]> wrote:
>
> > Ok, I forgot to include this link. Don't know what version Cloudera is on
> > w.r.t BlockCache, but they are claiming that using it during merges
> results
> > in critical-section (allocation) lock causing meltdown...
> >
> > https://blog.cloudera.com/blog/2016/08/resolving-lock-
> > contention-in-apache-solr-a-performance-analysis-detective-story/
> >
> >
> > Will this hold good for the latest BlockCache version of Blur too?
> >
> > On Fri, Dec 2, 2016 at 6:20 PM, Ravikumar Govindarajan <
> > [email protected]> wrote:
> >
> > > One thing I was wondering is, does block-cache acquire locks of any
> kind
> > > during reads?
> > >
> > > I don't use the 'read-then-cache' construct at all, so was just
> thinking
> > > if it is fine to eliminate locks (if any) on the read path
> > >
> > >
> > > On Mon, Oct 24, 2016 at 7:07 PM, Aaron McCurry <[email protected]>
> > wrote:
> > >
> > >> On Fri, Oct 21, 2016 at 1:41 AM, Ravikumar Govindarajan <
> > >> [email protected]> wrote:
> > >>
> > >> > Our application makes use of 'write-thru-block-cache' only. During
> > >> > search/merge-reads, we have modified block-cache code to only probe
> > the
> > >> > block-cache and avoid inserting to it.
> > >> >
> > >> > In such a usage scenario, I was thinking about introducing a
> > >> > 'readBufferSize'  (default=1KB) in CacheIndexInput. From block-cache
> > or
> > >> > underlying file we read only 'readBufferSize' data & adjust counters
> > >> > accordingly when it's a short-circuit read...
> > >> >
> > >> > You think it could be made workable?
> > >> >
> > >>
> > >> Yeah it should be.
> > >>
> > >>
> > >> >
> > >> > Another idea could be to bypass the cache directory during merges
> and
> > >> read
> > >> > > directly from the hdfsdirectory.  Then perhaps you could take
> > >> advantage
> > >> > of
> > >> > > the SC reads without having to deal with the cache directly.
> > >> >
> > >> >
> > >> > This is what we are currently evaluating & it looks to be a safe bet
> > >> >
> > >>
> > >> Ok, let me know if you have any questions.
> > >>
> > >>
> > >> >
> > >> > --
> > >> > Ravi
> > >> >
> > >> > On Fri, Oct 21, 2016 at 3:26 AM, Aaron McCurry <[email protected]>
> > >> wrote:
> > >> >
> > >> > > I my experience I too have used block cache sizes in the 64KB
> range
> > >> for
> > >> > the
> > >> > > same reasons you listed.  The biggest of which was because we were
> > >> > running
> > >> > > upwards of 100GB caches and 1K block cache sizes are not really
> > >> possible
> > >> > at
> > >> > > that size.  The biggest probably with the compaction is with the
> > .tim
> > >> > file,
> > >> > > the rest of the files are mostly sequential reads, but because
> that
> > >> file
> > >> > is
> > >> > > a tree it tends to jump all over the place during compaction.  I
> > would
> > >> > > recommend if you want to speed up compaction (merges) to allow the
> > tim
> > >> > > files to be put into block cache during the merge (e.i. turn quiet
> > >> reads
> > >> > > off for those files).  This of course could flow your cache with
> > data
> > >> > that
> > >> > > you are about to remove, so if you have the cache space it's the
> > >> easiest
> > >> > > solution.
> > >> > >
> > >> > > Another idea could be to bypass the cache directory during merges
> > and
> > >> > read
> > >> > > directly from the hdfsdirectory.  Then perhaps you could take
> > >> advantage
> > >> > of
> > >> > > the SC reads without having to deal with the cache directly.
> > >> > >
> > >> > > Aaron
> > >> > >
> > >> > > On Thu, Oct 20, 2016 at 3:53 AM, Ravikumar Govindarajan <
> > >> > > [email protected]> wrote:
> > >> > >
> > >> > > > We have set a fairly large cacheSize of 64KB in block-cache for
> > >> > avoiding
> > >> > > > too many keys, gc pressure etc...
> > >> > > >
> > >> > > > But CacheIndexInput tries to read 64KB of data during a
> > cache-miss &
> > >> > > fills
> > >> > > > up the CacheValue. When doing short-circuit-reads, this could
> turn
> > >> out
> > >> > to
> > >> > > > be excessive no? For a comparison, lucene uses only 1KB buffers
> > for
> > >> the
> > >> > > > same..
> > >> > > >
> > >> > > > Do you think this will likely affect performance of searches
> > albeit
> > >> in
> > >> > a
> > >> > > > minor way?
> > >> > > >
> > >> > > > --
> > >> > > > Ravi
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >
> > >
> >
>

Re: CacheIndexInput cacheSize

Reply via email to