Wow, this has been thought about 2 years back!! Amazing. I checked the code. One nit. Favorite-node must be avoided if it's already marked as an excluded-node
The use-case I stated above is a little complex than this. In-addition to primary node (local machine serving the shard), we will now need to instruct hadoop to favor the secondary too, during re-replication Also, we have a mixed storage implementation. 1st replica will be on SSD of local node, 2 & 3 on HDD of random nodes. So, need to handle it accordingly But this is a great implementation to build on. Many thanks for this -- Ravi On Fri, Apr 28, 2017 at 1:15 AM, Tim Williams <[email protected]> wrote: > Have you looked in /contrib for the block placement stuff? Maybe it > provides some ideas? > > https://git1-us-west.apache.org/repos/asf?p=incubator- > blur.git;a=tree;f=contrib/blur-block-placement-policy;h= > 743a50d6431f4f8cecbb0f55d75baf187da7f755;hb=HEAD > > Thanks, > --tim > > > On Wed, Apr 26, 2017 at 9:40 AM, Ravikumar Govindarajan > <[email protected]> wrote: > >> > >> In case of HDFS or MAPRF can we dynamically assign > >> shards to shardservers based on the data locality (using block > locations)? > > > > > > I was exploring the reverse option. Blur will suggest the set of > > hadoop-datanodes to replicate while writing index files. > > > > Blur will also explicitly control bootstrapping a new datanode & > > load-balancing it, as well as removing a datanode from cluster.. > > > > Such fine control is possible by customizing BlockPlacementPolicy API... > > > > Have started exploring it. Changes look big. Will keep the group posted > on > > progress > > > > On Fri, Apr 21, 2017 at 10:42 PM, rahul challapalli < > > [email protected]> wrote: > > > >> Its been a while since I looked at the code, but I believe a shard > server > >> has a list of shards which it can serve. Now maintaining this static > >> mapping (or tight coupling) between shard servers and shards is a design > >> decision which makes complete sense for clusters where nodes do not > share a > >> distributed file system. In case of HDFS or MAPRF can we dynamically > assign > >> shards to shardservers based on the data locality (using block > locations)? > >> Obviously this hasn't been well thought out as a lot of components > would be > >> affected. Just dumping a few thoughts from my brain. > >> > >> - Rahul > >> > >> On Fri, Apr 21, 2017 at 9:44 AM, Ravikumar Govindarajan < > >> [email protected]> wrote: > >> > >> > We have been facing lot of slowdown in production, whenever a > >> shard-server > >> > is added or removed... > >> > > >> > Shards which were locally served via short-circuit suddenly becomes > fully > >> > remote & at scale, this melts down. > >> > > >> > Block cache is kind of reactive cache & takes a lot of time to settle > >> down > >> > (at-least for us!!) > >> > > >> > Have been thinking of handling this locality issue for some time now.. > >> > > >> > 1. For every shard, Blur can map a primary server & a secondary > server in > >> > ZooKeeper > >> > 2. File-writes can use the favored nodes hint of Hadoop & write to > both > >> > these servers [https://issues.apache.org/jira/browse/HDFS-2576] > >> > 3. When a machine goes down, instead of randomly assigning shards to > >> > different shard-servers, Blur can decide to allocate shards to > designated > >> > secondary servers. > >> > > >> > Adding a new machine is another problem, where it will immediately > start > >> > serving shards from remote machines. It must need data copies of all > >> > primary shards it is supposed serve from local disk.. > >> > > >> > hadoop has something called BlockPlacementPolicy that can be hacked > into. > >> > [ > >> > http://hadoopblog.blogspot.in/2009/09/hdfs-block-replica- > >> > placement-in-your.html > >> > ] > >> > > >> > When booting a new machine, lets say we increase replication-factor > from > >> 3 > >> > to 4, for shards that will be hosted by new machine (setrep command > from > >> > hdfs console) > >> > > >> > Now hadoop will call our CustomBlockPlacementPolicy class to arrange > >> extra > >> > replication, where we can sneak in the new IP.. > >> > > >> > Once all shards to be hosted by this new machine are replicated, we > can > >> > close these shards, update the mappings in ZK & open them. Data will > be > >> > served locally > >> > > >> > Similarly, when restoring replication-factor from 4 to 3, our > >> > CustomBlockPlacementPolicy class can hook up to ZK, find out which > node > >> to > >> > delete the data & proceed... > >> > > >> > Do let know your thoughts on this... > >> > > >> >
