Re: General guidance on blur-shard server

Aaron McCurry Mon, 14 Sep 2015 18:31:47 -0700

Good stuff!  Thanks for sharing!  One issue I have found with the short
circuit reads:


https://issues.apache.org/jira/browse/HBASE-8143

Basically you need to turn the buffer size down.  The hdfs property
is: dfs.client.read.shortcircuit.buffer.size

Aaron

On Mon, Sep 14, 2015 at 6:42 AM, Ravikumar Govindarajan <
[email protected]> wrote:

> Finally we are done with testing with short-circuit read and SSD_One
> policy. Summarizing few crucial points we observed during query-runs
>
> 1. A single read issued by hadoop-client takes on an average 0.15-0.25
>     ms for 32KB byte-size. Some-times this could be on the higher side
>     like 0.6-0.65 ms per read… Actual SSD latencies got from iostat was
>     around 0.1ms with spikes of 0.6 ms
>
> 2. The overhead of hadoop wrapper code involved in SSD-reads is very
>     minimal & negligible. However we tested with a single-thread. May be
>     when multiple-threads are involved during queries, hadoop could be
>     a spoiler
>
> 3. It still makes sense to retain the block-cache. Assuming a bad-query
>     makes about 1000 trips to hadoop. Time consumed ~= 0.15*1000 =
>     150 ms. Block-cache could play a crucial role here. It could also help
>     in resolving multi-threaded accesses
>
> 4. Segment writes/merges are actually slower than HDD may be because
>     of sequential reads…
>
> Overall, we found good gains especially for queries using short-circuit
> reads when combined with block-cache.
>
> --
> Ravi
>
>
>
> On Wed, Aug 12, 2015 at 6:34 PM, Ravikumar Govindarajan <
> [email protected]> wrote:
>
> > Our very basic testing with SSD_One policy works as expected. Now we are
> > moving to test the efficiency of SSD reads via hadoop..
> >
> > I see numerous params that need to be setup for hadoop short-circuit
> reads
> > as documented here…
> >
> >
> >
> http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.1.7/bk_system-admin-guide/content/ch_short-circuit-reads-hdfs.html
> >
> > For production workloads are there any standard configs for blur?
> >
> > Especially, the following params
> >
> > 1. dfs.client.read.shortcircuit.streams.cache.size
> >
> > 2. dfs.client.read.shortcircuit.streams.cache.expiry.ms
> >
> > 3. dfs.client.read.shortcircuit.buffer.size
> >
> >
> >
> > On Tue, Aug 11, 2015 at 6:13 PM, Aaron McCurry <[email protected]>
> wrote:
> >
> >> That is awesome!  Let know your results when you get a chance.
> >>
> >> Aaron
> >>
> >> On Mon, Aug 10, 2015 at 9:21 AM, Ravikumar Govindarajan <
> >> [email protected]> wrote:
> >>
> >> > Hadoop 2.7.1 is out and now handles mixed storage… A single
> >> > data-node/shard-server can run HDDs & SSDs together…
> >> >
> >> > More about this here…
> >> >
> >> >
> >> >
> >>
> http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/ArchivalStorage.html
> >> >
> >> > The policy I looked for was "SSD_One". The first-copy of index-data
> >> placed
> >> > on local-machine will be stored in SSD. The second & third-copies
> >> stored on
> >> > other machines will be in HDDs…
> >> >
> >> > This eliminates need for mixed setup using RACK1 & RACK2 I previously
> >> > thought of. Hadoop 2.7.1 helps me to achieve this in a single cluster
> of
> >> > machines running data-nodes + shard-servers
> >> >
> >> > Every machine stores primary copy in SSDs. Writes, Searches, Merges
> all
> >> > take advantage of it, while replication can be relegated to slower but
> >> > bigger capacity HDDs. These HDDs also serve as an online backup of
> less
> >> > fault-tolerant SSDs
> >> >
> >> > We have ported our in-house blur extension to hadoop-2.7.1. Will
> update
> >> on
> >> > test results shortly
> >> >
> >> > --
> >> > Ravi
> >> >
> >> > On Mon, Jun 22, 2015 at 6:18 PM, Aaron McCurry <[email protected]>
> >> wrote:
> >> >
> >> > > On Thu, Jun 18, 2015 at 8:55 AM, Ravikumar Govindarajan <
> >> > > [email protected]> wrote:
> >> > >
> >> > > > Apologize for resurrecting this thread…
> >> > > >
> >> > > > One problem of lucene is OS buffer-cache pollution during segment
> >> > merges,
> >> > > > as documented here
> >> > > >
> >> > > >
> >> http://blog.mikemccandless.com/2010/06/lucene-and-fadvisemadvise.html
> >> > > >
> >> > > > This problem could occur in Blur, when short-circuit reads are
> >> > enabled...
> >> > > >
> >> > >
> >> > > True but Blur deals with this issue by not allowing (by default) the
> >> > merges
> >> > > to effect the Block Cache.
> >> > >
> >> > >
> >> > > >
> >> > > > My take on this…
> >> > > >
> >> > > > It may be possible to overcome the problem by simply re-directing
> >> > > > merge-read requests to a node other than local-node instead of
> fancy
> >> > > stuff
> >> > > > like O_DIRECT, FADVISE etc...
> >> > > >
> >> > >
> >> > > I have always thought of having merge occur in a Mapreduce (or Yarn)
> >> job
> >> > > instead of locally.
> >> > >
> >> > >
> >> > > >
> >> > > > In a mixed setup, this means merge requests need to be diverted to
> >> > > low-end
> >> > > > Rack2 machines {running only data-nodes} while short-circuit read
> >> > > requests
> >> > > > will continue to be served from high-end Rack1 machines {running
> >> both
> >> > > > shard-server and data-nodes}
> >> > > >
> >> > > > Hadoop 2.x provides a cool read-API "seekToNewSource"
> >> > > > API documentation says "Seek to given position on a node other
> than
> >> the
> >> > > > current node"
> >> > >
> >> > >
> >> > > > From blur code, it's just enough if we open a new
> FSDataInputStream
> >> for
> >> > > > merge-reads and issue seekToNewSource call. Once merges are done,
> it
> >> > can
> >> > > > closed & discarded…
> >> > > >
> >> > > > Please let know your view-points on this…
> >> > > >
> >> > >
> >> > > We could do this, but I find that reading the TIM file types over
> the
> >> > wire
> >> > > during a merge causes a HUGE slow down in merge performance.  The
> >> fastest
> >> > > way to merge is to copy the TIM files involved in the merge locally
> to
> >> > run
> >> > > the merge and then delete them after the fact.
> >> > >
> >> > > Aaron
> >> > >
> >> > >
> >> > > >
> >> > > > --
> >> > > > Ravi
> >> > > >
> >> > > > On Mon, Mar 9, 2015 at 5:45 PM, Ravikumar Govindarajan <
> >> > > > [email protected]> wrote:
> >> > > >
> >> > > > >
> >> > > > > On Sat, Mar 7, 2015 at 11:00 AM, Aaron McCurry <
> >> [email protected]>
> >> > > > wrote:
> >> > > > >
> >> > > > >>
> >> > > > >> I thought the normal hdfs replica rules were once local. One
> >> remote
> >> > > rack
> >> > > > >> once same rack.
> >> > > > >>
> >> > > > >
> >> > > > > Yes. One copy is local & other two copies on the same remote
> rack.
> >> > > > >
> >> > > > > How did
> >> > > > >> land on your current configuration ?
> >> > > > >
> >> > > > >
> >> > > > > When I was evaluating disk-budget, we were looking at 6
> expensive
> >> > > drives
> >> > > > > per machine. It lead me to think what those 6 drives would do &
> >> how
> >> > we
> >> > > > can
> >> > > > > reduce the cost. Then stumbled on this two-rack setup and now we
> >> need
> >> > > > only
> >> > > > > 2 such drives...
> >> > > > >
> >> > > > > Apart from reduced disk-budget & write-overhead on cluster, it
> >> also
> >> > > helps
> >> > > > > in greater availability as rack-failure would be recoverable...
> >> > > > >
> >> > > > > --
> >> > > > > Ravi
> >> > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> >
> >
>

Re: General guidance on blur-shard server

Reply via email to