> > Is this the code for the legacy short circuit reads or the newer version > that uses named pipes?
The legacy short-circuit reads is for domain-sockets. They have numerous perf-issues as documented here https://issues.apache.org/jira/browse/HDFS-347 Mmap APIs are the latest. They are referring it as "zero copy reads" and don't suffer any of the problems associated with legacy short-circuit reads. The only thing I find missing is that "unmap" control of blocks is vested with hadoop-client... -- Ravi On Tue, Jun 2, 2015 at 12:50 AM, Aaron McCurry <[email protected]> wrote: > On Wed, May 27, 2015 at 7:51 AM, Ravikumar Govindarajan < > [email protected]> wrote: > > > I was thinking on how blur can effectively use Mmap short-circuit-reads > > from hadoop. It's kind of long but please bear... > > > > Checked out hadoop-2.3.0 source. I am summarizing logic found in > > DFSInputStream, ClientMmap & ClientMmapManager source files... > > > > 1. New method read(ByteBufferPool bufferPool, > > > > int maxLength, EnumSet<ReadOption> opts) is exposed for > > > > short-circuit Mmap reads > > > > > > 2. Local-blocks are Mmapped and added to LRU > > > > 3. A ref-count is maintained for every Mmapped block during reads > > > > 4. When ref-count drops to zero for the block, it is UnMapped.This > happens > > when incoming read-offset jumps to a block other than current block. > > > > 5. UnMapping actually happens via a separate reaper thread... > > > > Step 4 is problematic, because we don't want hadoop to control > "unmapping" > > blocks. Ideally blocks should be unmapped when the original IndexInput > and > > all clones are closed from blur-side… > > > > If someone from hadoop community can tell us if such a control is > possible, > > I feel that we can close any perceived perf-gaps between regular lucene > > *MmapDirectory* and blur's *HdfsDirectory* > > > > It should be very trivial to change HdfsDirectory to use the Mmap read > > apis.. > > > > Is this the code for the legacy short circuit reads or the newer version > that uses named pipes? > > > > > > -- > > Ravi > > > > On Wed, May 27, 2015 at 12:55 PM, Ravikumar Govindarajan < > > [email protected]> wrote: > > > > > My guess is > > >> that SSDs are only going to help when the blocks for the shard are > local > > >> and short circuit reads are enabled. > > > > > > > > > Yes, it's a good-fit for such a use-case alone… > > > > > > I would not recommend disabling the block cache. However you could > > likely > > >> lower the size of the cache and reduce the overall memory footprint of > > >> Blur. > > > > > > > > > Fine. Can we also scale down the machine RAM itself? [Ex: Instead of > > 128GB > > > RAM, we can opt for a 64GB or 32GB RAM slot] > > > > > > One interesting thought would be to > > >> try using the HDFS cache feature that is present in the most recent > > >> versions of HDFS. I haven't tried it yet but it would be interesting > to > > >> try. > > >> > > > > > > I did try reading the HDFS cache code. Think it was written for > > Map-Reduce > > > use-case where blocks are loaded in memory [basically "mmap" followed > by > > > "mlock" on data-nodes] just before computation begins and unloaded once > > > done. > > > > > > On the short-circuit reads, I found that HDFS-Client is offering 2 > > options > > > for block-reads > > > 1. Domain Socket > > > 2. Mmap > > > > > > I think Mmap is superior and must have the same performance as lucene's > > > MmapDirectory… > > > > > > -- > > > Ravi > > > > > > On Tue, May 26, 2015 at 8:00 PM, Aaron McCurry <[email protected]> > > wrote: > > > > > >> On Fri, May 22, 2015 at 3:33 AM, Ravikumar Govindarajan < > > >> [email protected]> wrote: > > >> > > >> > Recently I am trying to consider deploying SSDs on search machines > > >> > > > >> > Each machine runs data-nodes + shard-server and local reads of > hadoop > > >> are > > >> > leveraged…. > > >> > > > >> > SSDs are a great-fit for general lucene/solr kind of setups. But for > > >> blur, > > >> > I need some help… > > >> > > > >> > 1. Is it a good idea to consider SSDs, especially when block-cache > is > > >> > present? > > >> > > > >> > > >> Possibly, I don't have any hard number for this type of setup. My > guess > > >> is > > >> that SSDs are only going to help when the blocks for the shard are > local > > >> and short circuit reads are enabled. > > >> > > >> > > >> > 2. Are there any grids running blur on SSDs and how they compare to > > >> normal > > >> > HDDs? > > >> > > > >> > > >> I haven't run any at scale yet. > > >> > > >> > > >> > 3. Can we disable block-cache on SSDs, especially when local-reads > are > > >> > enabled? > > >> > > > >> > > >> I would not recommend disabling the block cache. However you could > > likely > > >> lower the size of the cache and reduce the overall memory footprint of > > >> Blur. > > >> > > >> > > >> > 4. Using SSDs, blur/lucene will surely be CPU bound. But I don't > know > > >> what > > >> > over-heads hadoop local-reads brings to the table… > > >> > > > >> > > >> If you are using short circuit reads I have seen performance of local > > >> accesses nearing that of native IO. However if Blur is making remote > > HDFS > > >> calls every call is like a cache miss. One interesting thought would > be > > >> to > > >> try using the HDFS cache feature that is present in the most recent > > >> versions of HDFS. I haven't tried it yet but it would be interesting > to > > >> try. > > >> > > >> > > >> > > > >> > Any help is much appreciated because I cannot find any info from web > > on > > >> > this topic > > >> > > > >> > -- > > >> > Ravi > > >> > > > >> > > > > > > > > >
