On Tue, Oct 15, 2013 at 9:25 AM, Ravikumar Govindarajan <
[email protected]> wrote:

> Your idea looks great.
>
> But I am afraid I don't seem to grasp it fully. I am attempting to
> understand your suggestions. So please bear for a moment.
>
> When co-locating shard-server and data-nodes
>
> 1. Every shard-server will attempt to write a local copy
> 2. Proceed with default settings of hadoop [remote rack etc...] for next
> copies
>
> Ideally, there are 2 problems when writing a file
>
> 1. Unable to write a local copy. [Lack of sufficient storage etc...] Hadoop
> deflects the
>

True, this might happen if the cluster is low on storage and there are
local disk failures.  If the cluster is in this kind of condition and
running the normal Hadoop balancer won't help (reduce the local storage by
migrating to another machine) then it's likely we can't do anything to help
the situation.


>     write to some other destination, internally.
> 2. Shard-server failure. Since this is stateless, it will most likely be a
> hardware    failure/planned-maintenance.
>

Yes this is true.


>
> Instead of asynchronously re-balancing every write[moving blocks-to-shard],
> is it possible for us to trap into the above cases alone?
>

Not sure what you mean here.


>
> BTW, how do we re-arrange shard layout when shards are added/removed.
>
> I looked at 0.20 code and it seems to move shards too much. That could be
> detrimental for what we are discussing now, right?
>

I have fixed this issue or at least improved it.

https://issues.apache.org/jira/browse/BLUR-260

This implementation will only move the shards that are down, and will never
move any more shards than is necessary to maintain a properly balanced
shard cluster.

Basically it recalculates an optimal layout when the number of servers
changes.  It will likely need to be improved by optimizing the location of
the shard (which server) by figuring out what server that can serve the
shard has the most blocks from the indexes of the shard.  Doing this should
minimize block movement.


Also had another thought, if we follow something similar to what HBase does
and perform a full optimization on each shard once a day.  That would
locate the data to the local server without us having to do any other work.
 Of course this assumes that there is enough space locally and that there
has been some change in the index in the last 24 hours to warrant doing the
merge.

Aaron


> --
> Ravi
>
>
>
>
> On Sun, Oct 13, 2013 at 12:51 AM, Aaron McCurry <[email protected]>
> wrote:
>
> > On Fri, Oct 11, 2013 at 11:12 AM, Ravikumar Govindarajan <
> > [email protected]> wrote:
> >
> > > I came across this interesting JIRA in hadoop
> > > https://issues.apache.org/jira/browse/HDFS-385
> > >
> > > In essence, it allows us more control over where blocks are placed.
> > >
> > > A BlockPlacementPolicy to optimistically write all blocks of a given
> > > index-file into same set of datanodes could be highly helpful.
> > >
> > > Co-locating shard-servers and datanodes along with short-circuit reads
> > > should improve greatly. We can always utilize the file-system cache for
> > > local files. In case a given file is not served locally, then
> > shard-servers
> > > can use the block-cache
> > >
> > > Do you see some positives in such an approach?
> > >
> >
> > I'm sure that there will be some performance improvements my using local
> > reads for accessing HDFS.  The areas I would assume to see the biggest
> > increases in performance would be merging and fetching data for
> retrieval.
> >  Even with accessing local drives, the search time would likely not be
> > improved assuming the index is hot.  One of the reasons for doing
> short-cut
> > reads is to make use of the OS file system cache and since Blur already
> > uses a filesystem cache it might just be duplicating functionality.
> >  Another big reason for short-cut reads is to reduce network traffic.
> >
> > Given the scenario when another server has gone down.  If the shard
> server
> > is in an opening state we would have to change the Blur layout system to
> > only fail to the server that contains the data for the index.  This might
> > have some problems because the layout of the shards are based on how many
> > shards are online in the existing shards not where they are located.
> >
> > So in a perfect world if the shard fails to the correct server it would
> > reduce the amount of network traffic (to near 0 if it was perfect).  So
> in
> > short it could work, but it might be very hard.
> >
> > Assuming for a moment that the system is not dealing with a failure and
> > assuming that the shard server is also running a datanode.  The first
> > replica for HDFS is the local datanode, so if we just configure Hadoop
> for
> > the short-cut reads we would already get the benefits for data fetching
> and
> > merging.
> >
> > I had another idea that I wanted to run you.
> >
> > What if we instead of actively placing blocks during writes with the a
> > Hadoop block layout policy, we write something like the Hadoop balancer.
> >  We get Hadoop to move the blocks to the shard server (datanode) that is
> > hosting the shard.  That way it would be asynchronous and after a
> failure /
> > shard movement it would relocate the blocks and still use a short-cut
> read.
> >  Kind of like the blocks following the shard around, instead of deciding
> up
> > front where the data for the shard should be located.  If this is what
> you
> > were thinking I'm sorry not understanding your suggestion.
> >
> > Aaron
> >
> >
> >
> > >
> > > --
> > > Ravi
> > >
> >
>

Reply via email to