Thanks for the clarifications Aaron On Thu, Dec 8, 2016 at 2:53 AM, Aaron McCurry <[email protected]> wrote:
> Solr uses the original block cache that was created in Blur. As for > locking, the only locking code in the read path should be in the cache map > itself and in the HDFS client code. I believe both have some form of java > locks, likely the HDFS client will be far worse for performance. The block > cache itself should be lock free. > > Aaron > > On Fri, Dec 2, 2016 at 8:10 AM, Ravikumar Govindarajan < > [email protected]> wrote: > > > Ok, I forgot to include this link. Don't know what version Cloudera is on > > w.r.t BlockCache, but they are claiming that using it during merges > results > > in critical-section (allocation) lock causing meltdown... > > > > https://blog.cloudera.com/blog/2016/08/resolving-lock- > > contention-in-apache-solr-a-performance-analysis-detective-story/ > > > > > > Will this hold good for the latest BlockCache version of Blur too? > > > > On Fri, Dec 2, 2016 at 6:20 PM, Ravikumar Govindarajan < > > [email protected]> wrote: > > > > > One thing I was wondering is, does block-cache acquire locks of any > kind > > > during reads? > > > > > > I don't use the 'read-then-cache' construct at all, so was just > thinking > > > if it is fine to eliminate locks (if any) on the read path > > > > > > > > > On Mon, Oct 24, 2016 at 7:07 PM, Aaron McCurry <[email protected]> > > wrote: > > > > > >> On Fri, Oct 21, 2016 at 1:41 AM, Ravikumar Govindarajan < > > >> [email protected]> wrote: > > >> > > >> > Our application makes use of 'write-thru-block-cache' only. During > > >> > search/merge-reads, we have modified block-cache code to only probe > > the > > >> > block-cache and avoid inserting to it. > > >> > > > >> > In such a usage scenario, I was thinking about introducing a > > >> > 'readBufferSize' (default=1KB) in CacheIndexInput. From block-cache > > or > > >> > underlying file we read only 'readBufferSize' data & adjust counters > > >> > accordingly when it's a short-circuit read... > > >> > > > >> > You think it could be made workable? > > >> > > > >> > > >> Yeah it should be. > > >> > > >> > > >> > > > >> > Another idea could be to bypass the cache directory during merges > and > > >> read > > >> > > directly from the hdfsdirectory. Then perhaps you could take > > >> advantage > > >> > of > > >> > > the SC reads without having to deal with the cache directly. > > >> > > > >> > > > >> > This is what we are currently evaluating & it looks to be a safe bet > > >> > > > >> > > >> Ok, let me know if you have any questions. > > >> > > >> > > >> > > > >> > -- > > >> > Ravi > > >> > > > >> > On Fri, Oct 21, 2016 at 3:26 AM, Aaron McCurry <[email protected]> > > >> wrote: > > >> > > > >> > > I my experience I too have used block cache sizes in the 64KB > range > > >> for > > >> > the > > >> > > same reasons you listed. The biggest of which was because we were > > >> > running > > >> > > upwards of 100GB caches and 1K block cache sizes are not really > > >> possible > > >> > at > > >> > > that size. The biggest probably with the compaction is with the > > .tim > > >> > file, > > >> > > the rest of the files are mostly sequential reads, but because > that > > >> file > > >> > is > > >> > > a tree it tends to jump all over the place during compaction. I > > would > > >> > > recommend if you want to speed up compaction (merges) to allow the > > tim > > >> > > files to be put into block cache during the merge (e.i. turn quiet > > >> reads > > >> > > off for those files). This of course could flow your cache with > > data > > >> > that > > >> > > you are about to remove, so if you have the cache space it's the > > >> easiest > > >> > > solution. > > >> > > > > >> > > Another idea could be to bypass the cache directory during merges > > and > > >> > read > > >> > > directly from the hdfsdirectory. Then perhaps you could take > > >> advantage > > >> > of > > >> > > the SC reads without having to deal with the cache directly. > > >> > > > > >> > > Aaron > > >> > > > > >> > > On Thu, Oct 20, 2016 at 3:53 AM, Ravikumar Govindarajan < > > >> > > [email protected]> wrote: > > >> > > > > >> > > > We have set a fairly large cacheSize of 64KB in block-cache for > > >> > avoiding > > >> > > > too many keys, gc pressure etc... > > >> > > > > > >> > > > But CacheIndexInput tries to read 64KB of data during a > > cache-miss & > > >> > > fills > > >> > > > up the CacheValue. When doing short-circuit-reads, this could > turn > > >> out > > >> > to > > >> > > > be excessive no? For a comparison, lucene uses only 1KB buffers > > for > > >> the > > >> > > > same.. > > >> > > > > > >> > > > Do you think this will likely affect performance of searches > > albeit > > >> in > > >> > a > > >> > > > minor way? > > >> > > > > > >> > > > -- > > >> > > > Ravi > > >> > > > > > >> > > > > >> > > > >> > > > > > > > > >
