Re: Short circuit reads and hadoop

Ravikumar Govindarajan Tue, 15 Oct 2013 06:26:48 -0700

Your idea looks great.

But I am afraid I don't seem to grasp it fully. I am attempting to
understand your suggestions. So please bear for a moment.


When co-locating shard-server and data-nodes

1. Every shard-server will attempt to write a local copy
2. Proceed with default settings of hadoop [remote rack etc...] for next
copies

Ideally, there are 2 problems when writing a file

1. Unable to write a local copy. [Lack of sufficient storage etc...] Hadoop
deflects the
    write to some other destination, internally.
2. Shard-server failure. Since this is stateless, it will most likely be a
hardware    failure/planned-maintenance.

Instead of asynchronously re-balancing every write[moving blocks-to-shard],
is it possible for us to trap into the above cases alone?

BTW, how do we re-arrange shard layout when shards are added/removed.

I looked at 0.20 code and it seems to move shards too much. That could be
detrimental for what we are discussing now, right?

--
Ravi




On Sun, Oct 13, 2013 at 12:51 AM, Aaron McCurry <[email protected]> wrote:

> On Fri, Oct 11, 2013 at 11:12 AM, Ravikumar Govindarajan <
> [email protected]> wrote:
>
> > I came across this interesting JIRA in hadoop
> > https://issues.apache.org/jira/browse/HDFS-385
> >
> > In essence, it allows us more control over where blocks are placed.
> >
> > A BlockPlacementPolicy to optimistically write all blocks of a given
> > index-file into same set of datanodes could be highly helpful.
> >
> > Co-locating shard-servers and datanodes along with short-circuit reads
> > should improve greatly. We can always utilize the file-system cache for
> > local files. In case a given file is not served locally, then
> shard-servers
> > can use the block-cache
> >
> > Do you see some positives in such an approach?
> >
>
> I'm sure that there will be some performance improvements my using local
> reads for accessing HDFS.  The areas I would assume to see the biggest
> increases in performance would be merging and fetching data for retrieval.
>  Even with accessing local drives, the search time would likely not be
> improved assuming the index is hot.  One of the reasons for doing short-cut
> reads is to make use of the OS file system cache and since Blur already
> uses a filesystem cache it might just be duplicating functionality.
>  Another big reason for short-cut reads is to reduce network traffic.
>
> Given the scenario when another server has gone down.  If the shard server
> is in an opening state we would have to change the Blur layout system to
> only fail to the server that contains the data for the index.  This might
> have some problems because the layout of the shards are based on how many
> shards are online in the existing shards not where they are located.
>
> So in a perfect world if the shard fails to the correct server it would
> reduce the amount of network traffic (to near 0 if it was perfect).  So in
> short it could work, but it might be very hard.
>
> Assuming for a moment that the system is not dealing with a failure and
> assuming that the shard server is also running a datanode.  The first
> replica for HDFS is the local datanode, so if we just configure Hadoop for
> the short-cut reads we would already get the benefits for data fetching and
> merging.
>
> I had another idea that I wanted to run you.
>
> What if we instead of actively placing blocks during writes with the a
> Hadoop block layout policy, we write something like the Hadoop balancer.
>  We get Hadoop to move the blocks to the shard server (datanode) that is
> hosting the shard.  That way it would be asynchronous and after a failure /
> shard movement it would relocate the blocks and still use a short-cut read.
>  Kind of like the blocks following the shard around, instead of deciding up
> front where the data for the shard should be located.  If this is what you
> were thinking I'm sorry not understanding your suggestion.
>
> Aaron
>
>
>
> >
> > --
> > Ravi
> >
>

Re: Short circuit reads and hadoop

Reply via email to