Your idea looks great.
But I am afraid I don't seem to grasp it fully. I am attempting to
understand your suggestions. So please bear for a moment.
When co-locating shard-server and data-nodes
1. Every shard-server will attempt to write a local copy
2. Proceed with default settings of hadoop [remote rack etc...] for next
copies
Ideally, there are 2 problems when writing a file
1. Unable to write a local copy. [Lack of sufficient storage etc...] Hadoop
deflects the
write to some other destination, internally.
2. Shard-server failure. Since this is stateless, it will most likely be a
hardware failure/planned-maintenance.
Instead of asynchronously re-balancing every write[moving blocks-to-shard],
is it possible for us to trap into the above cases alone?
BTW, how do we re-arrange shard layout when shards are added/removed.
I looked at 0.20 code and it seems to move shards too much. That could be
detrimental for what we are discussing now, right?
--
Ravi
On Sun, Oct 13, 2013 at 12:51 AM, Aaron McCurry <[email protected]> wrote:
> On Fri, Oct 11, 2013 at 11:12 AM, Ravikumar Govindarajan <
> [email protected]> wrote:
>
> > I came across this interesting JIRA in hadoop
> > https://issues.apache.org/jira/browse/HDFS-385
> >
> > In essence, it allows us more control over where blocks are placed.
> >
> > A BlockPlacementPolicy to optimistically write all blocks of a given
> > index-file into same set of datanodes could be highly helpful.
> >
> > Co-locating shard-servers and datanodes along with short-circuit reads
> > should improve greatly. We can always utilize the file-system cache for
> > local files. In case a given file is not served locally, then
> shard-servers
> > can use the block-cache
> >
> > Do you see some positives in such an approach?
> >
>
> I'm sure that there will be some performance improvements my using local
> reads for accessing HDFS. The areas I would assume to see the biggest
> increases in performance would be merging and fetching data for retrieval.
> Even with accessing local drives, the search time would likely not be
> improved assuming the index is hot. One of the reasons for doing short-cut
> reads is to make use of the OS file system cache and since Blur already
> uses a filesystem cache it might just be duplicating functionality.
> Another big reason for short-cut reads is to reduce network traffic.
>
> Given the scenario when another server has gone down. If the shard server
> is in an opening state we would have to change the Blur layout system to
> only fail to the server that contains the data for the index. This might
> have some problems because the layout of the shards are based on how many
> shards are online in the existing shards not where they are located.
>
> So in a perfect world if the shard fails to the correct server it would
> reduce the amount of network traffic (to near 0 if it was perfect). So in
> short it could work, but it might be very hard.
>
> Assuming for a moment that the system is not dealing with a failure and
> assuming that the shard server is also running a datanode. The first
> replica for HDFS is the local datanode, so if we just configure Hadoop for
> the short-cut reads we would already get the benefits for data fetching and
> merging.
>
> I had another idea that I wanted to run you.
>
> What if we instead of actively placing blocks during writes with the a
> Hadoop block layout policy, we write something like the Hadoop balancer.
> We get Hadoop to move the blocks to the shard server (datanode) that is
> hosting the shard. That way it would be asynchronous and after a failure /
> shard movement it would relocate the blocks and still use a short-cut read.
> Kind of like the blocks following the shard around, instead of deciding up
> front where the data for the shard should be located. If this is what you
> were thinking I'm sorry not understanding your suggestion.
>
> Aaron
>
>
>
> >
> > --
> > Ravi
> >
>