> > My guess is > that SSDs are only going to help when the blocks for the shard are local > and short circuit reads are enabled.
Yes, it's a good-fit for such a use-case alone… I would not recommend disabling the block cache. However you could likely > lower the size of the cache and reduce the overall memory footprint of > Blur. Fine. Can we also scale down the machine RAM itself? [Ex: Instead of 128GB RAM, we can opt for a 64GB or 32GB RAM slot] One interesting thought would be to > try using the HDFS cache feature that is present in the most recent > versions of HDFS. I haven't tried it yet but it would be interesting to > try. > I did try reading the HDFS cache code. Think it was written for Map-Reduce use-case where blocks are loaded in memory [basically "mmap" followed by "mlock" on data-nodes] just before computation begins and unloaded once done. On the short-circuit reads, I found that HDFS-Client is offering 2 options for block-reads 1. Domain Socket 2. Mmap I think Mmap is superior and must have the same performance as lucene's MmapDirectory… -- Ravi On Tue, May 26, 2015 at 8:00 PM, Aaron McCurry <[email protected]> wrote: > On Fri, May 22, 2015 at 3:33 AM, Ravikumar Govindarajan < > [email protected]> wrote: > > > Recently I am trying to consider deploying SSDs on search machines > > > > Each machine runs data-nodes + shard-server and local reads of hadoop are > > leveraged…. > > > > SSDs are a great-fit for general lucene/solr kind of setups. But for > blur, > > I need some help… > > > > 1. Is it a good idea to consider SSDs, especially when block-cache is > > present? > > > > Possibly, I don't have any hard number for this type of setup. My guess is > that SSDs are only going to help when the blocks for the shard are local > and short circuit reads are enabled. > > > > 2. Are there any grids running blur on SSDs and how they compare to > normal > > HDDs? > > > > I haven't run any at scale yet. > > > > 3. Can we disable block-cache on SSDs, especially when local-reads are > > enabled? > > > > I would not recommend disabling the block cache. However you could likely > lower the size of the cache and reduce the overall memory footprint of > Blur. > > > > 4. Using SSDs, blur/lucene will surely be CPU bound. But I don't know > what > > over-heads hadoop local-reads brings to the table… > > > > If you are using short circuit reads I have seen performance of local > accesses nearing that of native IO. However if Blur is making remote HDFS > calls every call is like a cache miss. One interesting thought would be to > try using the HDFS cache feature that is present in the most recent > versions of HDFS. I haven't tried it yet but it would be interesting to > try. > > > > > > Any help is much appreciated because I cannot find any info from web on > > this topic > > > > -- > > Ravi > > >
